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grow and the performance to improve without an increase in software com? I 

Prototypes of MDBS ::e being implemented in order to carry out design verl .« 
cation and performance evaluation of MDBS. The types of design verification and 
performance evaluation of MDBS to be conducted are discussed in the report. The 
prototypes, will be developed in versions starting with a very simple version, i.e., 
MDBS-I, that is described in detail in this report. Four more versions are en- 
visioned. The rationale for each of the subsequent versions is also given. 

As the first in a series of reports on the implementation, this report dis- 
cusses the choice of hardware and operating system software. It also discusses 
the choice of the system programming language. 

The project is being used as an experiment in implementation methodologies and 
software engineering techniques. Thus, the report discusses the methodologies and 
techniques used, including a modified chief-programmer-team organization, struc- 
tured walkthrough, data and service abstractions, a formal systems design language, 
and structured coding. The choice of a f black-box T testing strategy is also dis- 
cussed . 

T he MDBS-I software system architecture is described in some detail. In 
particular, the portion of the system whicn processes the information aoout the 
catabase, i.e., the directorv data, is described. In order to use a database that 
already exists, a subsystem to convert and load the database will be provided. The 
database load subsystem is therefore described. Finally, in order to facilitate 
performance evaluation experiments, a program to generate test data is provided. 

The final section of the report provides a preliminary discussion of alter- 
native approaches for the operating system interface. Both a message-oriented 
approach and a procedure-oriented approach are examined for the purpose of sup- 
porting ccncurrency control of MDBS which is to be incorporated in the second 
version of MDBS, i.e., MDBS-II. 

The appendices contain the detailed designs for the directory management sub- 
system, the database load subsystem and the test data generation program. Later 
reports will describe subsequent versions of the multi-backend database system, 
namely, MIBS-II, MDBS-Ill, MDBS-IV and.MD3S-V. 
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1.0 AN INTRODUCTION TO HIGH-PERFORMANCE AND GREAT-CAPACITY DATABASE SYSTEMS 



Traditionally, database management systems run as large software pack- 
ages (e.g. , TOTAL) on large host computers (e.g., IBM 3033). Such systems 
have had problems with performance; i.e., as the database grows and the rate 
of requests to the database system increases, the host computer performance 
decreases. Instead of upgrading the host to a more powerful and expensive 
model (say,, IBM 3081) and incurring a major system interruption, it has been 
proposed [Cana74] to offload most of the database system software from the 
host to a second computer system, known as the backend, thus freeing the ex- 
isting host computer for other tasks. 

One backend approach is to use a single minicomputer for the backend. 
This approach can free up the host, thereby improving the system performance 
for other tasks. However, if the database continues to grow and the rate of 
requests continues to increase, this approach cannot solve the database per- 
formance problem since the backend will soon be overloaded. Consequently, 
its performance will be degraded just as the host's would have been in the 
traditional approach. Thus, overall performance of the host and backend will 
be degraded . This approach is known as the single sof twai e backend approach . 

A second approach to solving the database system performance problem is 
to develop a special-purpose database machine with specially designed 
hardware. However, the cost-effectiveness of this approach, known as the 
h ardware backend approach, has not yet been demonstrated. 

A third approach is to use multiple mini-computers configured in a novel 
and parallel way for performance improvement in order to allow for database 
growth and for increases in the request rate. This approach also requires 
the development of an innovative software design which allows the addition of 
mini-computers of the same type and the replication of the software on the 
new mini-computers without major system interruptions. Thus, it does not re- 
quire the development of any new hardware, but only the development of a new 
and replicatable software architecture and a new and parallel hardware 
configuration. Because it allows the use of multiple mini-computers, this 
approach will result in a multi-backend database system . 
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In this report we describe the current status of the development of a 
prototype of such a multi-backend database system known as MDBS. By a proto- 
type we mean one that has enough of the functionality of the full system to 
allow meaningful experiments to be conducted. However, some features that 
would be essential for a full system would be omitted in order to simplify 
the current implementation effort. The functionality provided and features 
omitted will be described in later sections. 



1 .1 Multi-Backend Database System Design Goals 

The major goals we are trying to achieve are to design a multi-backend 
database system that allows the database to grow and the rate of requests to 
increase while maintaining good overall performance. In particular, a "good" 
multi-backend database system with high performance and great capacity should 
have the following properties: 

(1) Throughput improvement is proportional to the number of backends. In 
other words, if the number of backends and disk drives is doubled, it 
should be possible to nearly double the size of the database and to 
nearly double the request rate on the database system. 

(2) Response time is inversely proportional to the number of backends. 
It should also be possible to nearly halve the average response time 
by doubling the number of backends. 

(3) System is extensible for capacity growth and/or performance improve- 
ment. By extensibility of a multi-backend database system we mean 
that upgrade of the system can be provided with no modification to 
existing software and no new programming; no modification to exist- 
ing hardware; and no major disruption of system activity when addi- 
tional hardware is being incorporated into the existing hardware and 
software . 

This kind of extensibility is to be provided by designing a system with 
one controller (i.e., the master mini-computer) and several backends (slave 
m i^i”Computer s ) where the design allows expansion by the addition of more 
backends of the same type, instead of by the replacement of the present back- 
ends with more powerful and expensive models. It also allows identical 
software to run on each backend, including new backends added for expansion. 
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This kind of extensibility calls for a design which minimizes the role of the 
controller of the backends so that it will not become a bottleneck after the 
addition of only a few backends. 



1.1.1 Design Issues 

Three types of design issues are addressed: hardware issues, system is- 
sues, and software issues. They are discussed in detail in [Hsia81a] and 
[Hsia81b]. Here we will review the hardware and system issues and solutions 
briefly. Since the software issues are closely related to our implementation 
effort, we will discuss their solutions more elaborately in the next section. 
In this chapter, definitions will be kept informal. More precise definitions 
can be found in Chapter 3 or in the previously mentioned reports, i.e., 
[Hsia81a] and [Hsia81b]. 

The hardware issues include the problem of backend interconnection — 
Should the backend3 communicate with each other via some kind of interconnec- 
tion hardware? How can this interconnection be provided in a cost-effective 
manner? The hardware issues also include the problem of database store 
interconnection — Should each disk be accessible by all the backends, by 
only one backend, or by seme but not all the backends? 

Several system issues are addressed: Database placement — Should re- 
lated records of a database be concentrated at one backend or should they be 
distributed across several backends? If the records are to be distributed 
across several backends, how should this distribution be done? Execution 
mode — should all backends process the same request in parallel or should 
different backends process different requests concurrently? Directory 
structure , placement . and management — How should the auxiliary information 
about the database be determined, organized and distributed among the back- 
ends? Access control capability — What are the kind and granularity of the 
access control and how should an access control mechanism be implemented? 
Data model and manipulatio n languag e — what data model should be supported 
and what data manipulation language should be used? 
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The software issues include the problem of degree of concurrency — 
Since the basic unit for processing is a request, should two or more requests 
be processed concurrently? If requests are not processed concurrently, what 
should be done when users submit groups (i.e., transactions) of requests? 
Can the processing of these requests be interleaved? As a part of the con- 
currency control issue the problem of consistency control and deadlock 
avoidance should be addressed — — How are the same data values of the database 
subject to concurrent processing by different requests to be kept consistent? 
How is deadlock to be avoided in an environment with multiple requests and 
concurrent processing? 



1.1.2 Solutions for a Multi- Back end Database System Architecture 

An overview of the resulting MDBS hardware organization is shown in Fig- 
ure 1. The issue of backend interconnection is resolved by having the con- 
troller and backends connected by a broadcast bus. The controller will 
broadcast each request to all backends at the same time ever this bus. 
Furthermore, there will be minimal broadcasting from one backend to the other 
backends. The issue of database store interconnection is resolved by giving 
each backend dedicated disk drives. 

The issues of database placement and execution mode are resolved by dis- 
tributing the data from each file across all the backends. Each backend will 
then process the data from its own disk drives. Because each file is spread 
across all the backends, all backends can now execute the same request in 
parallel. Request execution at a backend is handled by having a queue of re- 
quests at the backend. When a backend finishes executing one request it can 
start executing the next request. In view of the execution mode, MDBS is a 
multiple-instruction-and-multiple-data (MIMD) organizaton. 

The data model chosen for the system is the attribute-based data model 
[Hsia70]. In MDBS the database consists of files of records. Each record is 
a collection of keywords, optionally followed by a record body. A keyword is 
made of an attribute-value pair such as <S ALARY , $12 ,000 > where $12,000 is the 
value of the attribute SALARY. A record body is a string of characters 
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Figure 1. The MDBS Hardware Organization 



not used by MDBS for search purposes. An example of a record without a 
record body is shown below. 

( <FILE, Employ ee> , <EMPLOYEE_NAME , Smith > , <CITY, Columbus >, 

<SALARY , $12 ,000> , <SERVICE,10> ) 

The first attribute-value pairs in all records of a file are the same. In 
particular, the attribute is FILE and the value is the file name . For exam- 
ple, the above record is from the Employee file. 

For performance reasons, records are logically grouped into c lusters 
based on the attribute values and attribute value ranges in the records. 
These values and value ranges are called descriptors . For example, one clus- 
ter might contain records for those employed in Columbus, making at least 
$20,001 but not more than $25,000 and with at least 11 but not more than 15 
years of service. Thus records of this cluster are grouped by the following 
three descriptors: 

(CITY=Columbus ) , ( $20 ,001 =<SALARY=<$25 ,000 ) , ( 11=<SERVICE=<15 ) . 

Record retrieval in MDBS, for example, is done by clusters. Thus finding 
records of employees in Columbus making between $21,000 and $22,000 per year 

and with 12 to 13 years experience would require the retrieval of records in 

the cluster just described. Other requests such as to find records of em- 
ployees in Columbus making between $21,000 and $28,000 and with 12 to 13 
years experience might require additional retrieval of records from other 
clusters than the one identified above. 

In order to allow efficient processing of requests, records in a cluster 
are spread across all the backends. Thus each backend needs to search only 

its portion of the cluster. Given a user request, there must be a way, of 

course, first to determine which clusters to search and then to determine the 
location of records in a given cluster. To perform this task, MDBS utilizes 
available descriptor information. For example, given the previous request 
for finding employees where 

(CITY=Columbus ) and ( $21 ,000=<SALARY=<$28 ,000 ) and ( 12=<SERVICE=<13 ) 

MDBS first determines that two clusters must be searched. These are the 
clusters identified by the two sets of descriptors: 

{ (CITY=Columbus ) , ( $20 ,001 =<SALARY*<$25 ,000 ) , ( 11=<SERVICE=<15) > 

{ (CITY=Columbus) , ( $25 ,001 =<SALARY=*<$30 ,000 ) , ( 11=<SERVICE=<15) > 
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After the clusters are identified, MDBS must then determine the disk ad- 
dresses of the clusters at each backend. Finally MDBS will cause each back- 
end to retrieve from its disks the records so addressed. 

The execution phases of a retrieval request are summarized in Figure 2. 
Descriptor search determines the descriptors that correspond to the request. 
In our example, there are four descriptors corresponding to the request; 
namely , 

(CITYColumbus ) , ( $20 ,001 =<SALARY=»<$25 ,000 ) , 

($25 ,001= s <SALARY=<$30 ,000) , ( 11=»<SERVICE=<15) . 

In order to save space and to save processing time each descriptor is identi- 
fied by a descriptor id . For example, 



Descriptor 


Descriptor Id 


( CITY=Columbus ) 


D15 


( $20 ,001 =<SALARY=<$25 ,000 ) 


D125 


( $25 ,001=<SALARY=<$30 ,000 ) 


D126 


( 11=<SERVICE=<15 ) 


D250 



Thus the output of the descriptor search phase is the Boolean expression of 
descriptor ids 



D15 and (D125 or D126) and D250 
corresponding to 

' ( $20 ,001=<SALARY=<$25 ,000 ) 

° r f 

($25 ,001=<SALARY=<$30 ,000 )) 
which identifies two clusters. 



(CITY=Columbus ) and 



( 1 ) 

and ( 11- <SERVICE=<15 ) 



The next phase, cluster search must take the Boolean expression in (1) 
and actually determine the corresponding clusters. As with descriptors, 
clusters are also identified by ids, known as cluster ids , for example 



Descriptor Ids 


Cluster Id 


D15, D125 , D250 


C17 


D15 , D126, D250 


C22 
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The final two phases are address generation (to find the disk addresses, 
e.g., A354b and A3547, corresponding to each cluster id, e.g., C17) and 
record selection (to retrieve the actual records so addressed). 

Descriptor search, cluster search and address generation together form 
the major portion of directory management . 

Because all directory management is based on the concept of clusters, it 
is also logical to design an access control capability based on clusters. 
Thus cluster search is augmented by a cluster access control mechanism . 

The final design issue was the question of the degree of concurrency to 
be allowed. Executing one request at a time at a backend will frequently 
leave the backend's CPU idle while waiting for a disk to access records. 
Since the MDBS hardware organization provides multiple disk drives per back- 
end, it is possible for a backend to support concurrent processing of re- 
quests from different users. However a mechanism to control concurrent ac- 
cess to data must then be provided. The mechanism used in MDBS is also cen- 
tered on the concept of clusters. In particular, the concurrency control 
mechanism will lock clusters to prevent conflicting access to the same 
clustered data. 

This section has described the general method used by MDBS in processing 
a retrieval request. This processing is summarized in Figure 3. The next 
section will show how this processing is divided among the controller and the 
backends . 



1.1.3 Distribution of Request Execution Among Controller and Backends 

In the previous section, we mentioned how the database was distributed 
across the backends. However, we did not discuss the placement of directory 
data and the distribution of the processing required in directory management. 
In order to minimize the time for directory management and to facilitate 
record update, the directory data is duplicated at all backends. On the 
other hand, the processing required for directory management is not duplicat- 
ed at each backend. The descriptor search phase, instead, is divided among 
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the backends. Each backend must find only a subset of descriptor ids. It 
then broadcasts its results to all the other backends. In Figure 4 we sum- 
marize how directory management is performed at a backend. A retrieval re- 
quest is received from the controller. Then the backend performs a descrip- 
tor search on its portion of the request and broadcasts the resulting des- 
criptor ids to the other backends. After the descriptor ids from all other 
backends have been received, cluster search is used to determine the clus- 
ters. Finally, address generation determines the local disk addresses for 
records at that backend. 

The backend can do more than just retrieve all the records in a cluster. 
First, it can select those records that actually satisfy the request. For 
example, the request to find records of employees in Columbus earning more 
than $20,000 but not more than $28,000 and with more than 10, but not more 
than 15 years experience, requires selecting records from the two clusters. 
Those clusters are identified by 

(CITY=Columbus ) and ( $20 ,001=<SALARY=<$25 ,000 ) and ( 11=<SERVICE=<15) 

and 

(CITY=C olumbus ) and ( $25 ,001=<SALARY=<$30 ,000 ) and (11=<SERVICE=<15) . 

All the records will be selected from the first cluster, but only records 
with SALARV=<$28 ,000 will be selected from the second cluster. 

Often not all the data in a record is needed to respond to a request. 
In this example, only the names of the employees might be required. Thus the 
appropriate values must be extracted from the record. The other values may 
be discarded. Although not shown in this example, MDBS can perform various 
types of aggregate operations on a set of values instead of just returning 
the raw values. An example would be to find the average salary of employees 
who live in Columbus. Thus after selecting the appropriate records and ex- 
tracting the salary values, MDBS would compute the average. The steps of 
record processing are summarized in Figure 5. 

Referring to Figure 6, the execution of a user request can now be sum- 
marized as follows. The user submits a request to the host, which then 
transmits that request, in an internal form, to the controller of MDBS. The 
controller parses the request and then broadcasts it to the backends. The 
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Broadcast Mode 

• Controller-to-all-backends operation (e.g., query) 

• Backend-to-all-other-backends operations (e.g., transferring 
descriptor ids) 

Parallel Mode 

• Response-of-each-backend-to-controller operations (e.g., forwarding 
retrieved data) 



Figure 6. Modes of MDBS Operations 
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backends determine their portion of the descriptor ids and broadcast the re- 
sults to the other backends. Each backend detenaines the clusters that must 
be searched and the corresponding local disk address es. Then the appropriate 
records are selected, values extracted and results sent back to the 
controller. When the controller has received the results from all the back- 
ends, it performs any aggregate operation required and then forwards the 
final results to the host for return to the user. 



1 .2 Why Implement This System? 

The design of MDBS is based on extensive analysis of queueing models and 
simulation studies of MDBS components. These results are included in 
[Hsia81a] and [Hsia81b]. This report is concerned with the implementation of 
an MDBS prototype. We, therefore, will not repeat the expected performance 
of MDBS as simulated and analyzed in those reports. These models and studies 
are, of course, only approximations. We are implementing a prototype of MDBS 
in order to conduct more accurate performance ev« luation and more thorough 
design validation. 



1.2.1 Validation of Simulation Results 

The first reason to build a prototype system is to validate the simula- 
tion results. The main goal is to measure the extensibility of the system, 
i.e., how does it perform as more backends are added? In particular, is the 
performance gain proportional to the number of backends? If this proportion- 
ality holds for a small number of backends, how many backends can be added 
before no more improvement is possible? Can the response time, indeed, be 
improved for the same size database by increasing the number of backends, 
each with a smaller number of disk drives as is predicted by the simulation? 
The simulation models used to develop the design predict improved performance 
with an increase in the number of backends and the same amount of data. They 
also'predict constant performance with an increase in data, if the number of 
backends is increased. 
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(A) System Evaluation with Program-Generated Databases 

The first set of experiments will use test data that is generated by 
programs and specified by experimenters* The record formats will be deter- 
mined by the experimenter* The actual data will then be generated from dis- 
tributions specified by the experimenter. For example, one file might have 
10,000 records each with 10 fields. The value in the first field of a record 
may be drawn from a uniform distribution on the interval [0,100]; the second 
field of a record may be drawn from a predefined set of values, while the 
third field might come from a normal distribution. The number of records and 
their formats can be varied in the experiments. 

Requests will also be constructed in a similar way. This approach is 
taken first because it is easy to perform these experiments. However, we 
also intend to run experiments on actual databases borrowed from the Depart- 
ment of Defense's user community. 

(B) System Evaluation With Actual Databases 

The validation of the simulation will also use data taken from an actual 
database. Thus the second step will be to obtain one or more actual data- 
bases. Sets of "typical" requests will then be developed on the basis of the 
data languages of the databases. These databases and sets of requests will 
be used for a second set of experiments. It is hoped that such experiments 
will provide more insight into how a multi-backend system might actually per- 
form. Furthermore, it will provide insights into the relative performance of 
the multi-backend system vs. a single-backend system and vs. a conventional 
system. 



1.2.2 Towards a Methodology for Database Applications Classification 

After experimenting with several actual databases, our goal is to devel- 
op a methodology for classifying database applications. With such a metho- 
dology it should be easier to predict the performance of a new application on 
an existing multi-backend system. Such a classification could also be used 
in the redesign of the multi-backend system, since it would allow much more 
accurate simulation of system performance. Right now, only two gross appli- 
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cation classification schemes exist* One is to distinguish between 
"query- intensive” and "update- intensive** applications. In the first case 
most requests only seek information from the database, while in the second 
case most requests require addition and modification to the database. 

A second classification scheme involves the complexity of the queries. 
For example, some queries are very simple, e.g., finding the address of the 
employee whose employee number is 123456. Other queries are much more com- 
plex, e.g., finding the names and addresses of all employees who live in Co- 
lumbus, earn between $20,000 and $32,000 per year and have worked for the 
company for at least 10 years. There are still more complex queries which 
require reterence to more than one file. It seems likely that some designs 
will provide better performance on simple queries, while other designs will 
provide better performance on more complex queries. These class if icatons 
need to be made more precise. Still other classif icaton schemes need to be 
developed . 



1.2.3 Bench-Marking the System Performance 

A well-known method for comparing the relative performance of computer ° 
systems is to compare the average execution time of a standard instruction 
mix [Ferr78]. One such mix, the Gibson mix [Gibs70], was derived from the 
average relative usage of IBM 7090 CPU instructions in a scientific environ- 
ment. Similarly, this approach has been applied to high-level programming 
languages. One such mix, [Knut7l], was collected for the average relative 
usage of Fortran statements. Once such a mix has been developed, it can be 
used to estimate the performance of a new computer system by first determin- 
ing the execution time of each instruction type and then computing the 
weighted average execution time for the typical mix of instructions. 

This same technique may be generalized and applied to the performance of 
database systems. Corresponding to a standard CPU instruction mix would be a 
mix of low-level database processing statements such as the requests provided 
by MDBS. Corresponding to the high-level programming statement mix would be 
a mix of high-level query language statements provided by a language such as 
SQL [Astr76]. The relative mix of MDBS requests or SQL statements would be 
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determined by examining several typical database applications. This mix 
could be used to estimate the performance of a new database system after the 
execution time of each type of MDBS request or SQL statement is known. 



1 *3 The Implementation Strategy - What and Why? 

It seems only reasonable to develop most systems in stages. For proto- 
type systems such an approach seems even more important. Thus we plan to de- 
velop several versions of MDBS. We chose to begin with an implemention of a 
very simple system. 



1.3.1 Version I - A Very Simple System: Single Mini Without Concurrency 
Control and With Simplified Directory Management 

The system we are now implementing is intended to be as simple as possi- 
ble. The aim is to get something running so that we can gain some experience 
with both the MDBS design and our new computer systems. Thus we have chosen 
to simplify the design as much as possible. MDBS-I will execute only a sin- 
gle request at a time. It will run on a single computer, i.e., a PDP11/44. 
There is no distinction made between the slave and master. In other words, 
there is no separate controller. Directory management will be simplified by 
storing all directory data in the main memory. There will be no concurrent 
execution of requests. Since the whole system will run as a single operating 
system process, the interface with the operating system will be minimized. 



1.3.2 Version II - A Simple System: Single Mini With Concurrency Control 

The second version will allow concurrent execution of requests, but will 
still be restricted to a single mini. We plan to use the services of our op- 
erating system to facilitate this concurrent processing. Thus we will use 
the capability of creating independent concurrent processes which communicate 
among themselves. These processes will execute in parallel so that MDBS-II 
will be able to execute requests in parallel. This version will allow us to 
gain experience with the problem of multiple processes and the problem of 
concurrency control. 
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1.3.3 Version III - The First "Real" System : Multiple Minis With Concur- 
rency Control 

After MDBS-II is working, we will transfer the system to our real envi- 
ronment including a controller (i.e., VAX 11/780) and several backends (PDF 
ll/44s). This transfer should be fairly easy, since the major changes re- 
quired will be to replace communications between processes in one computer by 
communications between processes running on different computers. This ver- 
sion will allow us to see how the intercomputer communication overhead is 
going to affect system performance. This system, MDBS-Ill, will still not be 
sufficient for a full MDBS, since it has a very simplified directory manage- 
ment subsystem. However, it will allow us to begin preliminary testing of 
the MDBS design. 



1.3.4 Version IV - The Real System With "Good 11 Directory Management 

This version will include a fully implemented directory management sub- 
system utilizing the secondary memories. It will be a complete prototype 
system, except for the lack of access control features. This system, 
MDBS-IV, will be the one on which we will try to validate the simulation stu- 
dies used in the development of the original design. 



1.3.5 Version V - The Full System With All the Designed Features Included 

The final version will incorporate access control in the backends and a 
friendly user-interface in the controller or host computer. 



1 .4 The Organization of the Rest of the Report 

The rest of this report summarizes the design and implementation deci- 
sions that have been made, the software engineering approaches that have been 
selected and used, and the current status of the implementation. 
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1.4.1 Preparations for the First Effort of the Laboratory for Database Sys- 
tems Research 

This project marks the first implementation effort of the Laboratory for 
Database Systems Research. Before the implementation effort can begin, it is 
necessary to select the hardware to be used, both for the controller and for 
the backends; the implementation language; and the operating systems. The 
choices made and the rationale for the choices are discussed in the beginning 
of Chapter 2. 



1.4.2 Software Engineering Approaches to the First Effort 

Because the development of the prototype MDBS is our first implementa- 
tion effort, we have been using this development as an exercise in implemen- 
tation techniques. The actual implementation of any software system goes 
through several phases including specification, design, coding and testing. 
At present, the specifications and high-level design of MDBS have already 
been completed. We continue with the detailed design phase. Specific tech- 
niques for the detailed design, coding and testing phases have been adopted. 
These techniques are described in Chapter 2. 



1.4.3 The Implementation Status 

The implementation of MDBS-I is well underway. We expect the entire 
system to be operational in the spring of 1982. That implementation is 
described in detail in Chapters 3, 4 and 5. The directory management portion 
of the system is completed. We have also completed a utility, database load , 
to perform the loading of a database. Finally, a package to generate files 
of test data is also completed. 

In addition to directory management, database load utility and the test 
file generation package, some preliminary work has been done on the approach 
to be taken for concurrency control. These preliminary results are discussed 
in Chapter 6. 



PAGE 21 



2.0 THE PROJECT PLANNING AND THE IMPLEMENTATION EFFORT AND STRATEGY 

Before any effort toward implementing the MDBS prototype system can 
begin, many decisions are required. Project planners must choose the 
hardware for the prototype system. In particular, they must decide upon the 
minicomputers for the controller and the backends. Then the systems program- 
ming language must be selected. Finally, the operating systems must be cho- 
sen. The implementors must decide on an implementation strategy. They must 
develop a plan not only for what is to be done, but also for how it is to be 
done. The "what'* of this strategy is discussed in Section 1.3, which 
describes the five phases of the implementation strategy for MDBS. The "how 11 
of the strategy requires the selection of software engineering techniques to 
be used in the implementation effort. 

The primary goal of the implementation effort is to develop a prototype 
of MDBS to be used in database systems research. Some future directions for 
this research are presented in Section 1.2. Our goal also requires the 
software development effort to generate reliable software in as short a time 
period as possible without sacrificing the reliability and quality of the 
software. In succeeding sections of this chapter, we will document and ex- 
plain the decisions made by the project planners and implementors during the 
preparatory stages of the MDBS implementation effort. We will show how their 
choices of hardware and systems software and software engineering techniques 
are related to the goals of the implementation effort. 



2.1 The Choice of Hardware and Systems Software 

Project planners have to address three fundamental questions in prepara- 
tion for the implementation effort: 

(1) What kind of hardware should be used in the 
multi-backend database system as depicted 
earlier in Figure 1? 

(2) What systems programming language should be 
selected for the MDBS development effort? 
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(3) What operating systems should be used in order 
to best support the MDBS features? 

We will review the alternatives which the project planners considered, and 
give the reasons for their choices. 



2.1.1 The PDP11/34 vs, the PDPll/44 and the PDP11/70 vs. the VAX11/780 

Project planners want to select hardware which satisfies the require- 
ments of the MDBS hardware organization at the smallest price. The MDBS 
hardware organization is shown in Figure 1. That organization assumes that 
the backends are connected by a broadcast bus. It also assumes that the ded- 
icated disk drives at the backends have the capacity to support very large 
databases. In addition to the MDBS design requirements, project planners 
must consider that the development effort for an MDBS prototype will probably 
require more computing power than the computing power required to run the 
prototype. 

Since the planners anticipated an equipment grant from Digital Equipment 
Corporation (DEC), a proposal for DEC equipment was drafted. The proposal 
suggests that the most cost-effective selection of hardware would be 
PDPll/34s for backends and a PDP11/70 for the controller. At the time this 
proposal was drafted, the latest generation of the corresponding DEC minicom- 
puters was represented by the PDPll/44 and the VAX11/780. 

In a multi-backend database system, performance is improved by increas- 
ing the number of backends. MDBS is designed to be easily extensible, as ex- 
plained in Section 1.1, so that no significant software development or 
down-time costs are incurred in expanding the system. The greatest expense 
incurred will be the cost of the hardware. Therefore, the cost of adding 
backends to a system is an important measure of cost-effectiveness. In 1979, 
the PDP11/34 minicomputer was the least expensive model in the PDP11 series 
which supports large-capacity, hard disks and can be interconnected with 
DEC's Parallel Communication Bus (PCL) . Using PDPll/34s as backends will 
minimize expansion costs for MDBS. Hardware cost is less important in se- 
lecting the controller than in selecting the backends. The PDP11/70 can fur- 
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nish additional computing power required to support the development effort at 
a reasonable cost. 

DEC's response to the original proposal was that since the ultimate goal 
of the implementation effort is database systems research and not product de- 
velopment, the latest technology available should be used. Although newer 
equipment may be more costly, it may also enhance the research and implemen- 
tation effort. The final agreement, therefore, shows: PDPll/44s are used as 
the backends; the VAX11/780 is used as the controller and to support the 
program development effort; and the PCL is used to interconnect the 
VAX11/78C and PDPll/44s for the purpose of simulating the broadcast and par- 
allel transfer capabilities. (See Figure 6 again.) 



2.1.2 The Systems Programming Language 

A systems programming language for the MDBS implementation must be 
powerful yet relatively easy to use. In other words, the language must have 
enough constructs to program the features for the multi-backend database sys- 
tem discussed earlier in Section 1.1.2. It is also important to choose the 
programming environment and language constructs which will make the develop- 
ment effort easier. The implementation team for MDBS is composed primarily 
of computer science students who have little practical experience, although 
they have a broad base in textbook knowledge. A systems programming language 
which makes the development effort easier will help these relatively inexper- 
ienced implementors to develop more reliable software. 

Systems programming languages can be evaluated in terms of: 
availability, portability, and vendor support; programming environment and 
language features; and reliability and efficiency. Project planners exam- 
ined three systems programming languages; Bliss, C, and MAINSAIL. A brief 
evaluation of each language and a summary of the reasons for choosing C fol- 
low this section. Some important language features and issues which are ad- 
dressed in the evaluations are also explained in the following sections.* 

* We wish to thank Jos£ Alegria and Tom Bodnovich for the background 
material which they contributed for this section. 
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A data abstraction is a group of related functions or operations that 
act upon a particular class of objects. Users of an object represented by a 
data abstraction are constrained to use only the operations defined in the 
abstraction. The classic example of a push-down stack as a data abstraction 
includes operations to create new stacks, to "push" data onto a stack, to 
"pop" data off the top of a stack, and to test for stack-full and stack-empty 
conditions. This technique is useful in enforcing data integrity and in con- 
trolling concurrent operations on shared data. Such a language feature will 
be a useful way to implement solutions for one of the design issues for a 
multi-backend system mentioned in Section 1.1.1, i.e. the software issue of 
degree of concurrency. 

Another useful feature in a language is some mechanism for 
type-checking . Such a mechanism assures that the data types of the operands 
in an expression (or subexpression) are compatible with the operation which 
is to be pertormed. Type-checking contributes to the overall reliability of 
the software. The issue of reliability of a language involves whether or not 
the instructions in a language actually do what they are purported to do by 
the language designers and compiler writers. Clearly an unreliable language 
leads to unreliable software. Remember that a reliable prototype of MDBS is 
our goal. 

(A) The Bliss Language and Its Compilers 

The Bliss language [Wulf71] was originated in the Department of Computer 
Science at Carnegie-Mellon University. Dialects of Bliss are available from 
DEC for PDP11 and VAX systems, but there are significant differences between 
the dialects. Another limitation is that object code for the PDP11 must be 
generated by a cross compiler running on a larger computer system. 

There is no set of programming tools for Bliss programmers, so the pro- 
gramming environment is poor. Bliss is an expression-level language. In its 
syntax, all identifiers denote addresses rather than values, so a 
de-reterence operator ('.') must be used. For example, 

a = .a + .b 
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is a valid Bliss instruction which, when executed, adds the values at the 
addresses represented by identifiers a and b and stores the result at address 
a. This notation makes it difficult for the uninitiated to write or read 
Bliss code. The language supports no primitive data types. Since operators 
are never type-specific, type-checking is non-existent. Nevertheless, an ad- 
vantage of Bliss is that it supports the data abstraction concept. 

There is some question as to the reliability of Bliss, since it contains 
so many low-level features. It does, however, seem to be the best of those 
languages surveyed when measured in terms of time/space efficiency on the DEC 
equipment . 

(B) The C Language and Its Programming Development Environment 

The C language [Kern78| was originally designed for and implemented with 
the UNIX operating system [Ritc74] for the DEC PDP11 . UNIX is a Bell Tele- 
phone Laboratories trademark, and UNIX operating systems are licensed by 
Western Electric. C, however, is not tied to any particular operating system 
or architecture; C compilers are available on many systems. Not all ver- 
sions of C are compatible, so portability can be a problem. C is supported 
with all versions of UNIX, and is available from the Digital Equipment Com- 
puter Users Society (DECUS), for use with PDP11 and VAX operating systems. 

A rich set of program development tools usually accompanies UNIX system 
software. These tools provide a very good environment for C programmers. C 
syntax is very simple. The language supports primitive data types such as 
integer and character; type-checking, however, is not strongly enforced. C 
compilers usually do not support extra features, such as sophisticated macro 
processing, but many of these features are available in the programming envi- 
ronment support provided with UNIX. C, unlike Bliss, does not support the 
data abstraction concept. 

C is reasonably reliable, even though many vendors do not commercially 
support the language. It is also reasonably efficient. A good textbook for 
C users is [Kern78]. 
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(C) The MAINSAIL Language and Its Relationship to the Other Languages 

MAINSAIL (MAchine INdependent SAIL) [Wilc77 ] evolved from the program- 
ming language SAIL, which was developed in the late 1960s at Stanford Univer- 
sity's Artificial Intelligence Laboratory. XIDAK, Inc. owns exclusive 
rights to develop and market MAINSAIL. The language is distinguished by its 
portability. The same compiler and runtime system, both written in MAINSAIL, 
are the basis for every implementation; code generators and procedures which 
interface to the operating system must be specially written. MAINSAIL is im- 
plemented for DEC PDP11 systems. 

MAINSAIL was developed and is marketed with a set of integrated tools 
for program development. The syntax of the language is similar to ALGOL-60. 
Consequently, it appears familiar to most people with formal training in com- 
puter science. The set of data types supported is more extensive than that 
supported either by Bliss or by C , and there is strong type-checking. On the 
other hand, the major disadvantage of the language is that there is no capa- 
bility to invoke subroutines written in a language other than MAINSAIL or an 
assembly language. 

Reliability is rated good. Efficiency, however, is rated lower than 
that of either Bliss or C. Low-level features must be coded in an assembly 
language, which implies that two development languages must be learned rather 
than one. 

(D) Why Do We Choose the C Language? 

The efficiency ratings of MAINSAIL and the requirement that low-level 
features be coded in an assembly language quickly eliminate that language 
from consideration. The real choice, then, lies between Bliss and C. C has 
a number of features which make it more desirable than Bliss. 

First, C is a smaller language and has a simpler syntax. Given the 
inexperience of the implementation team, it is important to choose a language 
which can be easily learned. Next, the programming environment which can be 
provided under UNIX for developing C programs is a major consideration. A 
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third factor is that the compilation process for Bliss would require the re- 
sources of a computer system at considerable distance from the Laboratory for 
Database Systems Research. The Laboratory initially has only two PDP11/44 
systems, which are not large enough to support the Bliss compiler. The re- 
sources of a DECSYSTEM20 are available through the Department of Computer and 
Information Science, but it is neither convenient nor practical to set up the 
required communication links and procedures, since the alternative, using the 
C language, is acceptable. C, then, is the language we choose, since it can 
make the greatest contribution toward the goal:3 of the implementation effort. 



2.1.3 The Operating Systems 

Important considerations in choosing the operating systems are system 
performance, suitability of the operating system features for the MDBS appli- 
cation, and suitability of the operating system features for the development 
effort. System performance is critical if the design goals for a 
multi-backend system are to be met. The first two of the three design goals, 
which are explained fully in SectioD 1.1, are: 

(1) Throughput performance proportional to the number of 
backends . 

(2) Response time inversely proportional to the number 
of backends. 

Suitability for the MDBS application is related to the software solutions 
described earlier in Section 1.1.2. Operating system features must support 
the solutions selected for design issues such as degree of concurrency. 
Suitability for the development effort relates to the implementation goals to 
develop reliable software and to effectively manage the development effort. 
An operating system which is easier for relatively inexperienced programmers 
to use will be more suitable for development. Both UNIX and RSX11 are ana- 
lyzed with these considerations in mind. 

(A) The UNIX Operating System 

UNIX [Ritc74] is a very M user-f riendly" operating system. Interactive 
programs which teach the user how to use operating system facilities are a 
part of the UNIX package; all documentation is available on-line. A variety 
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of aids to C programmers are available* An example is the program "Lint", 
which checks C programs for syntax errors, such as type violations, which are 
not checked by the C compiler. 

The characteristics mentioned above make the UNIX environment desirable 
for program development* UNIX does, however, lack some system features which 
are required for the MDBS implementation. For example, the UNIX file system 
would not be satisfactory for our purposes; we would have to write a com- 
plete new input/output subsystem. 

(B) The RSX11 Operating System 

RSX11 is a DEC real-time operating system. Since real-time systems are 
engineered for execution speed, RSX11 is desirable from the performance 
standpoint. RSX11 also provides more flexibility; implementors can choose 
which operating system features to use. RSX11 also has a variety of features 
such as message passing which will be useful in implementing software solu- 
tions for concurrency control and backend intercommunication. RSX11 provides 
a less desirable programming environment than UNIX, due to the limited set 
of programming aids which are available through DECUS. 

(C) Why Do We Choose the UNIX Operating System for the 
Development Effort and RSX11 for the Run-Time Effort? 

The above discussions make it clear that, while UNIX is more favorable 
for MDBS development, RSX11 is more suitable for MDBS applications. An addi- 
tional factor to be considered is that C language programs are portable from 
UNIX to RSX11 with only minor conversion. Furthermore, both UNIX and RSX11 
are available for the PDP11/44 and the VAX11/780. Thus, project planners in- 
tend to take advantage of the best features of both systems. UNIX is to be 
used in the development effort, i.e., for programming the MDBS procedures. 
RSX11 is to be used for research purposes. The MDBS procedures when complet- 
ed, will be put together and run with the RSX11 operating system as the final 
MDBS. It will be used to validate the results of the MDBS simulation studies 
- one of the research directions discussed in Section 1.2. 
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2.2 The "How 11 of the Implementation Strategy 

The software development life cycle is commonly described in stages as 
follows : 

(1) Requirements analysis 

(2) Specification 

(3) Design 

(4) Coding 

(5) Testing 

(6) Operation and maintenance 

It is sound software engineering practice to choose specific techniques to be 
used throughout the system life cycle. The software engineering objectives 
are to enhance the reliability of the software which is developed and to pro- 
vide continuity throughout the life of the system. A further objective for 
the MDBS implementation effort is that the implementation should proceed as 
quickly and effectively as possible, since the eventual goal is to do re- 
search using the prototype system. 

The MDBS implementation begins with stage two, since stage one, require- 
ments analysis, is largely completed. The implementation strategy presented 
earlier in Section 1.3 details the development of five versions of MDBS. 
Each version after the first will be based in part upon some previous ver- 
sion. Furthermore, these multiple versions may not be developed in chrono- 
logical order; the implementation team can be working on more than one ver- 
sion at the same time. Therefore, it becomes especially important to select 
specific software engineering techniques for the design, coding, and testing 
stages of the sottware development effort. These techniques should be se- 
lected to provide the best possible project management techniques, design and 
development tools, and documentation for the life cycles of all of the ver- 
sions of MDBS. The techniques to be used in the MDBS implementation effort 
are described in succeeding sections of this report.* 



2.2.1 Team Organization and Monitoring the Development Effort 

Two issues in management strategy are specifically addressed in the 



* We would like to thank Tamer Ozsu for the background material 
which he contributed for this section. 
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choice of software engineering techniques for the MDBS implementation effort. 
First, how should the group be organized? Second, what specific techniques 
should be adopted to monitor the development effort? 

(A) A Modified Chief-Programmer-Team Organization 

The classic chief-programmer team [Mill71] is headed by a project 
leader, the chief programmer, who has absolute decision-making authority. 
Other permanent members of the team include a senior-level backup programmer 
and a librarian. Additional programmers may be added as necessary. 

The c hief programmer does all the design work and writes all of the 
critical sections of code, for example the routines for subsystem interfaces. 
The backup programmer is an understudy for the chief programmer, and partici- 
pates in design and coding; he takes over if the chief programmer leaves the 
team. The librarian maintains the group's program libraries and coordinates 
the documentation effort. 

One advantage of such an organization is that, since the levels of com- 
munication between team members are minimized, development is likely to 
proceed at a faster pace than with a decentralized organization. Also, the 
system which is developed is likely to be more coherent and consistent since 
it is designed primarily by one person. By selecting this organization, we 
enhance the reliability and speed of development, in accordance with our 
software engineering objectives. 

The MDBS implementation group is organized as a modified 
chief-programmer team. The entire effort is headed by a team supervisor. 
Separate teams are organized for each subproject being developed; each of 
these teams is composed of a chief programmer, one backup programmer and one 
or more programmers. A second organization chart of the group, depicted in 
Figure 7, shows three such teams working on directory management, test file 
generation, and database load. 




a. The Organization as of 6/15/81 



TEST FILE GENERATION 



Chief : 


D.S. Kerr 


Backup : 


P. Strawser 


Programmers : 




R. Browder 

S. Barth 




Chief: 


p. Strawser 


Backup : 


M. Higashida 


Programmers : 




Z. Shi 
R. Browder 
D.S. Kerr 



Chief: 


A. Orooli 


Backup : 


X. He 


Programmers : 




J. Bendig 




S. Barth 



b. The Organization as of 10/1/81 



Figure 7. The Organization of the MDBS Design 
and Implementation Teams 
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(B) The Structured Walkthrough 

A structured walkthrough is a formal review of the software development 
effort at a given stage in its development cycle. The work is reviewed by a 
walkthrough committee, with the purpose of finding any errors that may be 
present. The purpose of a walkthrough is not to solve problems, only to 
identify them; neither is a walkthrough a management tool to evaluate any 
employee's performance. 

Each member of the walkthrough committee has a well-defined role. A 
coordinator organizes and runs the meeting. The presentor , the originator of 
the work, presents his work to the group and answers any questions. The 
reviewers examine the material before the walkthrough is held, and, during 
the walkthrough, present their findings. A scribe records the proceedings. 
Each member votes on the outcome; the material may be accepted as presented, 
accepted with revision, or returned for revision and subsequent walkthrough. 

The MDBS group uses this technique at the design stage and at the coding 
stage. All detailed program specifications and source code are reviewed by 
walkthrough committees. These committees are chosen to include members from 
more than one chief-programmer team. This practice contributes to a more ef- 
fective walkthrough, since not all participants are involved in the develop- 
ment of the material being reviewed. It is also valuable in cross-training 
team members in areas other than those to which they are currently assigned. 
The status of a task can be determined by reviewing the walkthrough reports 
for that task. Figure 8 shows a sample walkthrough report. A good reference 
describing the structured walkthrough technique is [Your79]. 



2.2.2 The Design and Coding Stages of the MDBS Life Cycle 

During the design and coding stages of the software development life 
cycle, the detailed program specification is developed, and code in some pro- 
gramming language is generated from the program specification. The design 
strategy and methodology and the approach to coding must be carefully chosen 
to be complementary. A design strategy is selected first; then a design 
methodology with which to implement the strategy is chosen. The approach to 
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WALKTHROUGH REPORT 

Coordinator! 
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XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXIXXXXXXXXXXXXXXXXXtXXXXXXXXIXXXXXXX 
Coordinator ' t Chacklist. 
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TIME .jLi.i ¥ DURATION 



Participant 

1 . 

J. 
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5 . .1 

4. t 



Rola 











Can 



Hat 



Attand hatariai 



Initia 1 s 

.SLUL 




/ 
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Atanda: 

la All participants aarat to follow tha (taaa!) sat of rula*. 

-• 2a Haw rroJact! walkthrough of aatarial. 

Old rrojact: ita»-by-itaa checkoff of previous action list 

— 3a Creation of nay action list (cont ribut iont b* aach participant). 

4. Group dacition* 

— 3# Dalivar copy of thit fora to project lanaiaiant. 

xxxxxxxxxxxxxxxxxxxttxxxxxxxxxxxxxxxxxxxxxxixxxxxxxxxxxxxxxxxxxxxxxxxxx 

Dacition: --^Accept product at-it 

y£ Ravita (no further walkthrough) 

Ravita and echeduie another walkthrough 
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Notes: 

Refer to Figure 7a, which shows the MDBS organization 
chart in effect at the time this walkthrough was held. 

Note that three of the four chief-programmers are represented 

in this walkthrough committee. 

This module is a part of the test file generation 
programming task. 



Figure 8. A Sample Walkthrough Report 
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coding follows logically from these decisions. A top-down design strategy, 
implemented in a formal system specification language, and a structured cod- 
ing technique are used in the MDBS implementation effort. 

(A) A Top-Down Design Strategy and the Use of Data Abstraction 

A top-down design strategy is a natural choice for MDBS. The design and 
analysis study in [Hsia81a] and [Hsia81b] clearly describes the top level of 
design. It also suggests the possibility of functional decomposition, i.e., 
the entire system can be broken into discrete functional units. This idea is 
supported by Section 1.1.2, which describes a multi-backend system architec- 
ture and summarizes the execution phases of a retrieval request, as depicted 
in Figure 2. Directory management, an example of a functional unit, includes 
the descriptor search, cluster search, and address generation phases of re- 
quest execution. 

At a lower level, one concept, data abstraction, is borrowed from the 
bottom-up design approach. Since MDBS is being developed as a prototype sys- 
tem and is to be used to research performance evaluation, we anticipate that 
data structures will be routinely modified in attempts to measure the effect 
of different data structures on system performance. The data abstraction al- 
lows us to separate the basic system functions from the data structures, min- 
imizing the effect on the system when a data structure is modified. 

(B) A Formal Systems Specification Language (SSL) 

The design methodology which the MDBS implementation group uses is a 
systems specification language (SSL) modeled on the program description 
language (PDL) described in [Ling79] . The SSL adopts the same basic con- 
structs as that PDL. The SSL is characterized by a formal "outer syntax" and 
an informal "inner syntax". It supports the outer-syntax constructs required 
for a structured design methodology - sequence, decision, and iteration. 
Below is an example of the if-then-else decision construct. 
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if expression 

then statement sequence 
else statement sequence 

endif ; 

The underlined words represent the formal outer syntax. The other words 
represent the informal inner syntax; the only requirement for this inner 
syntax is that it must be understood by all team members. 

In addition there are constructs for the expression of the different 
levels of program execution: job, module and procedure. A job is at the 
highest level of program execution. Test file generation described in 
Chapter 5 and documented in Appendix B, for example, is a job. A procedure 
is at the lowest level of program execution. It corresponds to the usual no- 
tion of a subroutine. Procedures are invoked to perform some work on some 
input data and produce some output. However, they are not allowed to retain 
data between invocations. Figure 9 shows a typical SSL procedure specifica- 
tion. More examples of SSL specifications using other constructs can be 
found in the appendices of this report. 

Above the level of procedures, we have the level of modules. A module 
is intended for the implementation of a data abstraction. It consists of the 
procedures and data structures implementing the abstraction. An additional 
construct, the subsystem construct, is added to support the idea of function- 
al decomposition. In other words, each job may perform several functions, 
each of which is a subsystem. Thus, subsystems are at the second highest 
level of program execution. Directory management described in Chapter 3 and 
documented in Appendix I), for example, is a subsystem, as is database load 
described in Chapter 4 and documented in Appendix C. The job for both direc- 
tory management and database load is, of cour.se, the MDBS. 

We may also introduce one more construct, the concurrent construct, to 
allow the designers the capability of expressing the notion of concurrent ex- 
ecution, including concurrent execution at different backends. For example, 
directory management may be executed on all backends concurrently, while da- 
tabase load executes on the controller. 




Comments for programs 
statements immediately 
above 



FOURTH LEVEL SPECIFICATION FOR DATABASE LOAD 
VERSION 2* September 16* 1981 
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4.10.21.1- EEOC 



LIST_TYPE-C_ATTR_NAMES /* TYPECLST (DBL1113) %/ 

(input! type-C_ 3 ttr_naaes* 
atpointer ) * 



/ X List all the attribute naaes over which tape-C descriptors %/ 
/% are to be defined. Input is a list for attribute naaes X/ 
/t over which tupe-C attributes are to be defined* arid a %/ 
/% pointer to the AT. */ 



w 



4.10.21.2 

tt 



scalar 



/t Index to list of attribute names. %/ 



index* 
attr_name* 

duplicate* /% Indicator - TRUE or FALSE. 1/ 

dditpointer*/* Pointer into DDIT returned from ATM*/ 
/t FIND function. %/ 

descr.tape* /% A* B* C* or N0TF0UND. %/ 



4.10.21.3 

4.10.21.4 



4.10.21.5 

4.10.21.6 

4.10.21.7 

4.10.21.8 



index ! = 1* /% Null indicates end of list. 

tape-C_attr_naaesCindex] != null* 

, . , . „ , . , ^Outer svntax elements are 

wbile_aore_taPe-C descriptors do ^ under lined. They are the 

bedio-— — =— f y 



.SSL constructs. 



4.10.21.9 



4.10.21.10 

4.10.21.11 

4.10.21.12 

4.10.21.13 
4, 10.21 , 14** 
4.10.21.15 



Set attr_name fro* terminal* 
eerfar* ATrtfFIND(3ttr_name* 

dditpointer* 
pointer to descr_tape)> 

if a tupe-A or tape-B descriptor is alreada defined 
over this attribute name 
/% descr.tape not = N0TF0UND */ 
ibea , 

displaa error message* dinner syntax elements 

else 
beaio 

duplicate = FALSE* 
eerfor* SEARCH_TYPE-C_ATTR_NAMES 




4.10. 
4.10. 
4.10. 
4.10. 
4.10. 
4. 10. 
4.10. 
4.10. 



21.16 

21.17 

21.18 

21.19 

21.20 
21.21 
21.22 
21.23 



( tupe-C_3ttr_names* 
attr_name* 
duplicate) * 

if duplicate is FALSE 
tbeD 

beSio 



tape-C_attr_na*esCindexl 5= attr_naae* 
index l~ index + 1* 
tape-C_attr_na*esCindexl != null* 
eod-if * 
eod.if * 




4.10.21.24 



eod.ubile* 



4.10.21.25 eod.ecoc 



This number means that this is the 25-th 
program statement in this procedure. The 
procedure number is 4.10.21 which means that it 
was called at program statement 21 in the level-3 
procedure numbered 4.10. That procedure was in turn, called 
at program statement 10 of the level-2 procedure numbered 4. 
Procedure 4, in turn, was called by program statement 4 in 
the main procedure. 



Figure 9. A SSL Specification of a Program Procedure 
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(C) A Practice of Structured Coding 

The value of structured coding techniques to the software development 
effort is generally recognized. "Structured coding" refers to a methodology 
for problem solving as well as to the particular programming constructs used 
in code development. 

The structured coding methodology is a top-down approach to the applica- 
tion of the principle of modularity, i.e., that a program procedure should 
have only one function. "Function" in this context means the transformation 
of input into output. A large problem is broken down into smaller 
sub-problems. This process is repeated until the solution for the smallest 
sub-problem is expressed as a procedure. 

Structured code requires the procedure to be written with a small set of 
programming constructs: the statement sequence, the if-then-else and case 
for decisions, the do-while for iteration. It has been proved that any pro- 
gram can be written with only these constructs. 

2.2.3 A "Black-Box" Testing Approach 

In the black-box approach to testing , test data is selected without 
reference to the internal structure of the program. Instead, test data is 
generated based on the program functions described in the requirements 
analysis study. This approach is in contrast to the structural approach to 
testing, where test data is selected based on some characteristics of the 
internal program structure, for example, the number of paths through the pro- 
gram. 



Intuitively, the black-box testing approach is applicable to testing da- 
tabase systems, since database users generally know more about the content of 
their databases than about the inner workings of the database system. Test 
data selected using the black-box approach will more closely resemble a real- 
istic test of the system. Another advantage of the black-box approach is 
that, since no knowledge of internal program structures is required to devel- 
op the test data, it is easier to integrate into the testing phase the people 
who are not involved in the development phase. 
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One application of the black-box approach is functional testing 
[Howd80]. In this application, programs are viewed as functions which map 
values from the program's domain of input variables into its domain of output 
variables. Test data is selected based on the important properties of ele- 
ments in these domains. The functional testing method is particularly suited 
to the MDBS implementation. The requirements analysis study in [Hsia81a] and 
[Hsia81b] describes the functional components of MDBS and their input and 
output domains. One example, explained earlier in Section 1.1.2, is the 
descriptor search phase of request execution. The input domain of descriptor 
search includes the set of retrieval requests; its output domain is the set 
of Boolean expressions of descriptor ids. 



2.2.4 A Uniform Documentation Standard 

The objectives of a uniform documentation standard are [Gilm79]: 

(1) To achieve precise and unambiguous communication 
among staff members. 

(2) To produce complete and accurate documentation. 

(3) To assist in project management. 

(4) To reduce dependence on individuals. 

We have an additional objective for the MDBS documentation standard: to in- 

tegrate the documentation effort into the design and development stages of 
the MDBS implementation. 

A documentation standard is developed in three steps. First, the termi- 
nology to be used must be selected. For MDBS, we adopt a set of standards 
for naming programs, program source files, and documentation text files. 
More specifically, each program will have a mnemonic name which describes its 
function 4 s well as a coded name which identifies its place in the sub- 
system hierarchy. For example, the hierarchy chart in Figure 10 shows both 
the mnemonic and coded names for the procedures of the database load subsys- 
tem. 



In the second step, the end products of the documentation effort are 
described. The organization and content of each document is planned in 
detail. For MDBS, two formal documents are proposed: a systems reference 



//fileprep 

(DBL11) 



Vtypeadef 

(DBL1111) 



// DESCRDEF 
(DBL111) 



//DBLOAD (DBL1) 
/ 




//typebdef 

(DBL1112) 



//TYPECLST (DBL1113)' 

// REVDESCR (DBLlllA) 

/ // ATTRCHAR /\. / 

/ / SRCHCLST / (DBL1121) 

(DBL1122) //SRCHCLST (DBL1122) \ 

//REVRTEMP (DBL1123) / 




// RTEMPDEF 
(DBL112) 



/DRVAORB (DBL1131) 



/ 

/ DR\ T KWORIt 
(DBL113) 



/ DRVC (DBL1132) 



Procedures on the left 
of a solid line are the 
subprocedures of the 
procedure on the right 
of the solid line. 




/ / LOADDATA 
(DBL14) 



/ PUTINLST (DBL1133) 



BLDSRT (DBL1134) 



// PROCLUST 
(DBL141) 



REVTYPEC (DBL1135) 

/GETRAND (DBL1411) 




/ Coding is completed; walkthrough 
is completed; test is to start. 

// Testing is completed also. 



/DISTRREC (1412)' 



Procedures on the left of a dotted 

line are also the subprocedures of 
the procedure on the right of the 
dotted line. 



/newclust 

(DBL14121)' 



Figure 10. A Sample Procedure Hierarchy 
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manual (SRM), and an operating procedures manual(OPM). The SRM will be de- 
veloped around the design documentation, i.e., the SSL specifications, thus 
minimizing the amount of new material to be written. Material for the 0PM 
will be developed during the design of the system's user interface. 

The above steps define the documentation task. The next step is to de- 
fine procedures for managing the documentation effort, A documentation coor- 
dinator will assist the project manager to monitor the MDBS documentation 
process. Milestones in the documentation effort are identified to establish 
a schedule by which the coordinator can measure progress. The first of these 
milestones is delivery of the SSL specification to the programmer; progress 
of the documentation will be monitored starting at that point, A 
step-by-step procedure is established which charts the documentation process 
from the first milestone to the last milestone, which is the assembly of the 
finished document. 

Conformity to the uniform documentation standard will assist the devel- 
opment group to prepare complete, accurate, and timely documentation. The 
MDBS implementation strategy calls for multiple^ versions of the MDBS proto- 
type to be developed; some of these versions will be based on previous ver- 
sions, The organization of the implementation teams is based on specific 
tasks; the team will be reorganized as new tasks replace completed tasks. 
These are two of the reasons that good documentation and a uniform documenta- 
tion standard are especially important to the MDBS implementation effort. 



2.3 A Retrospective 

After six months experience with the MDBS implementation effort, we 
reexamine our decisions. Since the implementation is in its early stages, we 
cannot make any conclusive statements. We do, however, observe that thus far 
the decisions have proved to be sound. Here we will briefly review our ex- 
perience with the hardware and systems software and with the software en- 
gineering techniques. 



PAGE 41 



2.3.1 Evaluating the Hardware and the Systems Software 

The PDPll/44s have performed as expected. The VAX11/780 is scheduled to 
be delivered soon. The PCL is installed and operational, although we have 
not yet reached a stage where the software development effort requires a 
broadcast capability, since MDBS-I and MDBS-II require no such capability. 

To date we have not had available a working version of UNIX, so all of 
the development has been done under RSX11. We hope to have Berkeley UNIX on 
the PDPll/44s very soon. The entire implementation team is learning and 
using the C language as the development effort is progressing. We have en- 
countered only those difficulties due to minimal support provided by RSX11 
for programming in C. We have not yet reached a stage in system development 
where the underlying features of the operating system are important. 



2.3.2 Evaluating the Software Engineering Experience 

The project management techniques and the design and coding techniques 
have served us well. The SSL and the structured walkthrough have been par- 
ticularly valuable. We have, however, discovered some voids in implementa- 
tion of our software engineering techniques as well as some additional areas 
where new techniques are needed. 

The largest void in implementation is that there is no project librarian 
to maintain code libraries and no documentation coordinator to supervise the 
documentation effort. An area in which the lack of any standard technique or 
procedure has proved to be a handicap is in the coding process, where data 
structures other than those encapsulated in data abstractions have been 
shared between subsystems. These problems can be solved, however, without 
invalidating any of the original decisions. It will be instructive to ob- 
serve whether this remains true as the MDBS implementation progresses. 
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3.0 THE DESIGN AND IMPLEMENTATION OF MDBS VERSIONS 



In this chapter we describe the overall designs of MDBS-I and MDBS-II. 
We then present the detailed designs of those parts of MDBS-I and MDBS-II 
that have been implemented. Occasionally, we refer to other versions of MDBS 
in the course of examining design alternatives. Thus, some of the design al- 
ternatives are also discussed. On the other hand, details of the implementa- 
tion, i.e., data structures and program modules specified in System Specifi- 
cation Language(SSL) , are not included in this chapter. Because they do not 
fit well with the designs and discussions written in the English prose, the 
implementation details are placed, instead, in the appendices. 

In Section 3.1 we first discuss the data model used and summarize the 
data manipulation language adopted. As is described in Chapter 1, records 
are grouped into clusters by descriptors. Thus we next discuss in Section 
3.2 the notion of record clustering and the use of descriptors. Finally, we 
summarize in Section 3.3 the entire process of request execution in MDBS-I 
and MDBS-II. 

Section 3.4 is devoted to directory management. There, we discuss the 
detailed design of directory management in MDBS-I. 



3 ,1 The Data Model and The Data Manipulation Language 

In this section, we develop, in detail, the attribute-based data model 
used in MDBS. We then describe the data manipulation language in which users 
may issue requests to MDBS. The language also encompasses the useful notion 
of a transaction. 



3.1.1 Concepts and Terminology 

The smallest unit of data in MDBS is a keyword which is an 
attribute-value pair, where the attribute may represent the type, quality, or 
characteristic of the value. Information is stored in and retrieved from 
MDBS in terms of records. A record is made up of a collection of keywords 



PAGE 43 



and a record body. The record body consists of a (possibly empty) string of 
characters which are not used for search purposes by MDBS. For logical rea- 
sons, all the attributes in a record are required to be distinct. An example 
of a record without record body is shown below: 

( <FILE, Employee^ <JOB, Mgr>, <DEPT, Toy>, <SALARY , 30000>). 

The record consists of four keywords. The value of the attribute DEPT, for 
instance, is Toy. In particular, the first attribute, FILE, is known as a 
system attribute and the value of the system attribute is the file name of 
the record. 

(A) Three Kinds of Keywords 

MDBS recognizes several kinds of keywords: simple, security and direc- 
tory. Simple keywords are intended for search and retrieval purposes. 
Security keywords are intended for access control. Since MDBS-I does not im- 
plement any access control feature, no reference to security keywords will be 
made in this report. Directory keywords are used for forming clusters. As 
is described in Chapter 1, records of a cluster are distributed across the 
backends. Within a backend, records of a cluster are stored in close proxim- 
ity. We will discuss the concept of a cluster and cluster algorithms in Sec- 
tion 3.2. 

(B) Keyword Predicates 

A keyword predicate . or simply predicate . is of the form (attribute, re- 
lational operator, value). A relational operator can be one of 
«{ = , ! = , >, >=, <,=<}. A keyword K is said to satisfy a predicate T if the 
attribute of K is identical to the attribute in T and the relation specified 
by the relational operator of T holds between the value of K and the value in 
T. For example, the keyword <S ALARY, 1 5000 > satisfies the predicate (SALARY > 
10000 ). 
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(C) Three Types of Descriptors 



A descriptor can be one of three types: 



Type-A: The descriptor is a conjunction of a less-than-or-equal-to predicate 
and a greater-than-or-equal-to predicate, such that the same attri- 
bute appears in both predicates. An example of a type-A descriptor 
is as follows: 



((SALARY >= 2,000) and (SALARY =< 10,000)). 



More simply, this is written as follows: 

(2,000 =< SALARY =< 10,000). 



Thus, for creating a type-A descriptor, the database creator merely 
specifies an attribute (i.e., SALARY) and a range of values ($2,000 
and $10,000) for that attribute. We term the value to the left of 
the attribute the lower limit and the value to the right of the at- 
tribute the upper limit . 

Type-B: The descriptor is an equality predicate. An example of a type-B des- 
criptor is: 



(POSITION = Professor). 

Type-C: The descriptor consists of only an attribute* name, known as the 
type-C attribute . Let us assume that there are n different keywords 
Kl, K2, ..., Kn, in the records of a database with a type-C attri- 
bute. Then, this type-C descriptor is really equivalent to n type-B 
descriptors Bl , B2 , ..., Bn, where Bi is the equality predicate sa- 
tisfied by Ki. In fact, this type-C descriptor will cause n differ- 
ent type-B descriptors to be formed. From now on, we shall refer to 
the type-B descriptors formed from a type-C descriptor as type-C 
sub-descriptors . For instance, consider that DEPT is specified as a 
type-C attribute for a file of employee records. Furthermore, let 
all employees in the file belong to either the Toy department or the 
Sales department. Then, two type-B descriptors will be formed as 
follows for this file. 



(DEPT«Toy) and (DEPT»Sales) 
They are the type-C sub-descriptors of DEPT. 
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(D) Rules for Providing Descriptors 

The database creator may cause clusters to be formed for his database by 
giving the MDBS a list of descriptors. However, he must observe certain 
rules in providing the descriptors. These are specified below: 

(1) Ranges specified in type-A descriptors for a given attribute must be 
mutually exclus ive . 

(2) For every type-B descriptor of the form (attribute-1 = value-1), no 
type-A descriptor can have the same attribute (i.e., attribute-1) and 
a range that contains its value (i.e., value-1). 

(3) An attribute that appears in a type-C descriptor must not also appear 
in a type-A or a type-B descriptor defined previously. 

(4) Type-A descriptors are specified first; type-B descriptors next; 
type-C descriptors last. 

(E) The Relationship of Keywords and Descriptors 

A keyword is said to be derived or derivable from a descriptor if one of 
the following holds: 

(1) The attribute of the keyword is specified in a type-A descriptor and 
the value is within the range of the descriptor. 

(2) The attribute and value of the keyword match those specified in a 
type-B descriptor. 

(3) The attribute of the keyword is specified in a type-C descriptor. 

(F) Query Conjunctions and Queries 

A query conjunction . or simply conjunction , is a conjunction of predi- 
cates. An example of a query conjunction is: 

( SALARY>25000 ) and (DEPT=Toy ) and (NAME=Jai) . 

We say that a record satisfies a. query conjunction if the record contains 
keywords that satisfy every predicate in the conjunction. 

A query is any arbitrary Boolean expression of predicates. An example 
of a query is: 



((DEPT=Toy) and ( SALARY <10000 ) ) or ( (DEPT=Book) and ( SALARY >50000 )) . 
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3.1.2 The Data Manipulation Language (DML) 

The data manipulation language for MDBS is a non-procedural language 
which supports four different types of requests - retrieve, insert, delete 
and update. The syntax of these various requests and examples of them are 
presented below. 

(A) Retrieve Requests 

The syntax of a retrieve request is: 

RETRIEVE Query Target-List [BY Attribute] [WITH Pointer]. 

That is, it consists of five parts. The first part is the name of the re- 
quest. The second part is a query which identifies the portion of the data- 
base to be retrieved. The target-list is a list of elements. Each element 
is either an attribute, e.g., SALARY, or an aggregate operator to be per- 
formed on an attribute, e.g., AVG(SALARY). We will support five aggregate 
operators - AVG, SUM, COUNT, MAX, MIN - in MDBS. An example of a target-list 
of two elements is ( NAME , SALARY ) . The values of an attribute in the 

target-list are retrieved from all records identified by the query. If no 
aggregate operator is specified on the attribute in the target-list, its va- 
lues in all the records identified by the query are returned directly to the 
user or user program. If an aggregate operator is specified on the attribute 
in the target-list, some computation is to be performed on all the attribute 
values in the records identified by the query and a single aggregate value is 
returned to the user or user program. The fourth part of the request, re- 
ferred to as the BY-clause , is optional as designated by the square brackets 
around it. The use of the By-clause is explained by means of an example. 
Assume that employee records are to be divided into groups on the basis of 
the departments for the purpose of calculating the average salary for all the 
employees in a department. This may be achieved by using a retrieve request 
with the specific target-list, ( AVG ( SALARY) ) , and the specific BY-clause, BY 
DEPT. Finally, the fifth part of the request, which is an optional 
WITH-clause, specifies whether pointers to the retrieved records must be re- 
turned to the user or user program for later use in an update request. Some 
examples of retrieve requests are presented below. 
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Example 1. Retrieve the names of all employees who work in the Toy De- 
partment . 

RETRIEVE (FILE® Employee ) and (DEPT=*Toy ) (NAME) 

Example 2. Retrieve the names and salaries of all employees making more 
than $5000 per year, 

RETRIEVE (FILE® Employee) and ( SALARY>5000 ) ( NAME , SALARY ) 

Example 3. Find the average salary of an employee, 

RETRIEVE (FILE® Employee ) ( A VG ( SALARY ) ) 

Example 4. List the average salary of all departments, 

RETRIEVE (FILE® Employee) (AVG( SALARY) ) BY DEPT 



(B) Insert Requests 

The syntax of an inner t request is: 

INSERT Record 

where the Record is to be inserted into the database. An example of an in- 
sert request is: 

INSERT ( <FILE , Emp 1 oy ee > , < SALARY , 5000 > , <DEPT , Toy > ) 

(C) Delete Requests 

The syntax of a delete request is: 

DELETE Query 

where the Query specifies the particular records to be deleted from the data- 
base, An example of a DELETE request is: 

DELETE (NAME®Hsiao) or ( SALARY >50000 ) 



(D) Update Requests 

The syntax of an update request is: 

UPDATE Query Modifier 

where the Query specifies the particular records to be updated from the data- 
base and the Modifier specifies the kinds of modification that need to be 
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done on records that satisfy the query. In an update request, if a single 
attribute value is to be changed, then the attribute is termed the attribute 
being modified . The modifier in an update request specifies the new value to 
be taken by the attribute being modified. The new value to be taken by the 
attribute being modified is specified as a function f of the old value of ei- 
ther the same attribute or some other attribute (say, attribute-1). More 
specifically, the modifier may be one of the following five types: 

Type-0 : <attribute=constant> 

Type-I : <attribute=f (attributed 

Type- II : <attribute=f (attribute-1 )> 

Type-Ill : <attribute=f ( attribute-1 ) of Query> 

Type-IV : <attribute=f (attribute-1 ) of Pointer> 

Let a record whose attribute is being modified be referred to as the 
record being modified . Then, a type-0 modifier sets the new value of the at- 
tribute being modified to a constant. A type-I modifier sets the new value 
of the attribute being modified to be some function of its old value in the 
record being modified. A type-II modifier sets the new value of the attri- 
bute being modified to be some function of some other attribute value in 
the record being modified. A type-III modifier sets the new value of the at- 
tribute being modified to be some function of some other attribute value in 
another record uniquely identified by the query in the modifier. Finally, a 
type-IV modifier sets the new value of the attribute being modified to be 
some function of some other attribute value in another record identified by 
the pointer in the modifier. 

An example of a type-0 modifier is: 

<SALARY=50000> 

This sets the salary in all the records being modified to 50000. 

An example of a type-I modifier is: 

<SALARY=1 .1*SALARY> 

This raises the salary in all the records being modified by 10%. 

An example of a type-II modifier is: 

<MONTHSAL=YEARSAL/ 1 2 > 

This sets the monthly salary in all the records being modified to be a 
twelfth of their own yearly salaries. 
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An example of a type-III modifier is: 

<S ALARY “SALARY of (FILE-Wife) and (NAME=*Tara)> . 

This causes the following actions to be ta<en by MDBS* Using the query 
"(FILE=Wif e) and (NAME*Tara)", a record is retrieved. Then, the SALARY value 
of that record is obtained. This value is used for the salary in all the re- 
cords being modified. 

An example of a type-IV modifier is: 

<SALARY= SALARY of 2000 > 

which modifies the salary in all the record* being modified to that of the 
record stored in location 2000. In order to use this type of modifier, the 
user must have previously issued a retrieve request which had WITH POINTER 
option. 

An example of a complete update request would be: 

UPDATE (FILE 53 Employee ) <SALARY=SALARY+5000> 
which gives a $5000 raise to all employees. 



3.1.3 Transactions and Consistencies 

In DML, we allow the flexibility for a user to specify a set of requests 
for repeated execution. Such a pre-spec Lf ied set of requests shall be re- 
ferred to as a transaction . As in other systems, a transaction must preserve 
consistency . A database-creator specifies a set of assertions on the data- 
base. These assertions are constraints which must be satisfied by data in 
the database. For instance, since employees may not have negative salaries, 
an assertion on the database may require that all employees have non-negative 
salaries. An assertion about a database is said to be true in the database 
if the data in the database satisfies the constraints in the assertion. A 
database is in a consistent state if all the assertions made on the database 
by the database-creator are true in the database. Finally, a transaction is 
said to preserve consistency if assuming the database is in a consistent 
state before the transaction is executed, then immediately after the transac- 
tion has completed execution, the database must be still in a consistent 
state. 



PAGE 50 



3 *2 The Notion of Record Clusters 

Record clusters are formed for the purposes of narrowing the search 
space and minimizing the effort needed to search for records which may satis- 
fy a given request. In other words, by organizing a database into clusters 
and by maintaining information about these clusters, MDBS may readily identi- 
fy those clusters whose records will satisfy the given request, thereby achi- 
eving high throughput and good response time. 

Although the notion of a record cluster for the aforementioned purposes 
is well known, the effectiveness of clusters for throughput gain and response 
time improvement lies in the effectiveness of the clustering algorithm for 
forming clusters and the placement strategy for storing these clusters. In 
other words, it depends on how clusters are formed and placed. Interestingly 
enough, it does not depend on how clusters are used. In other words, the 
throughput and response time of MDBS are 'immune' to the way the clusters 
are utilized. This is because every request execution by MDBS will involve 
the search and retrieval of clusters. Such search and retrieval can always 
be shown to be maximal for throughput gain and response- time improvement. 
Briefly, this is due to our use of the descriptors as a means to define and 
form clusters. As we recall, a descriptor is either a single predicate or a 
conjunction of predicates. We may also recall that a query in a user request 
is a Boolean expression of predicates. Thus, a given user request will re- 
quire the retrieval of data which satisfy the predicates of the expression. 
Since clusters are formed by the definition of descriptors and both descrip- 
tors and queries utilize the common notion of predicates, the data retrieved 
for the request are actually one or more clusters. Clusters therefore become 
the ideal formation (or unit) of data for storage and retrieval and for per- 
formance optimization. 

In the following sections, we will describe how the clusters are formed 
in MDBS and how they are used. We will begin with some definitions. 
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3.2.1 Cluster Formation 

For a database, the creator of the database specifies a number of des- 
criptors called clustering descriptors , or simply, descriptors . An attribute 
that appears in a descriptor is called a directory attribute . We say that a 
directory attribute belongs to a descriptor if the attribute appears in that 
descriptor. 

We recall that a record consists of attribute-value pairs or keywords. 
For purposes of clustering, only those keywords of the record which contain 
directory attributes are considered. Such keywords of the record are termed 
directory keywords . From the rules for forming descriptors specified ear- 
lier, it is easy to see that a directory keyword is derivable from at most 
one descriptor. For example, consider a database with SALARY as the only di- 
rectory attribute. Furthermore, let (0=<SALARY= s <50000 ) be the only descrip- 
tor D1 on SALARY specified by the database creator. Now, consider two re- 
cords, one containing the directory keyword <SALARY,25000> 3 nd the other con- 
taining the directory keyword <SALARY,750Q0> . Clearly, the former directory 
keyword is derivable from the descriptor D1 and the latter directory keyword 
is not derivable from Dl . Hence, the latter keyword is aot derivable from 
any descriptor in the database and we say that the directory keyword is 
derivable from no descriptor . Since a record may have many directory key- 
words, each of which will be derivable from at most one descriptor, we say 
that the record is derived from a. set of descriptors . It is possible for a 
record to be derived from the empty set of descriptors. There are two such 
cases. In the first case, it may happen that a record does not contain any 
directory keyword. In this case, it is said that the record is derived from 
the empty set of descriptors. Thus, going back to the previous example with 
the single directory attribute, SALARY^ and the single descriptor, 
(0=<SALARY=<50000) , a record which does not contain any salary information 
(i.e., no keyword with the attribute SALARY) is said to be derived from the 
empty set of descriptors. The second case in which a record is derived from 
the empty set of descriptors is when the record does indeed contain directory 
keywords, but these keywords are not derivable from existing descriptors. In 
the previous example, a record with the directory keyword <SALARY, 75000> 
which is not derivable from the descriptor is therefore derived from the 
empty set of descriptors also. 
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If two records are derived from the same set of descriptors, they are 
likely to be retrieved together in response to a user request, since these 
two records have keywords which are derivable from the same set of descrip- 
tors. Thus, these two records should be stored together in the same cluster. 
A cluster is, therefore, a group of records such that every record in the 
cluster is derived from the same set of descriptors. We say that a record 
cluster is defined by the set of descriptors from which all records in the 
cluster are derived. 

It is easy to see that a record belongs to one and only one cluster. 
The reasoning is as follows. A record consists of zero or more directory 
keywords. If it consists of zero directory keywords, it belongs to the clus- 
ter defined by the empty set of descriptors. If the record consists of one 
or more directory keywords, then, the record must be derived from one and 
only one set of descriptors, since each directory keyword is derived from at 
most one descriptor. This unique set of descriptors defines the unique clus- 
ter to which the record belongs. Thus, we have used the concept of descrip- 
tor sets to partition the database into equivalence classes, namely clusters. 

In order to form clusters for the records in a database, the 
record- to- c luster algorithm is provided to take a record and determine its 
cluster. For each attribute-value pair in the record, determine if the at- 
tribute is a directory attribute. If it is not, then that attribute-value 
pair is not used for cluster determination. If the attribute is a directory 
attribute, determine the descriptor, if any, from which it is derived. We 
refer to this descriptor, if any, as the corresponding descriptor for the 
given attribute-value pair. The set of corresponding descriptors for all the 
attribute-value pairs in a record defines the cluster to which the record be- 
longs. By using the algorithm on every record of a database at 
database-creation time, we may form the record clusters of the database. 

3.2.2 Cluster Determination During Request Execution 

Up to this point, we have been describing the process of cluster forma- 
tion. We will now explain how clusters are used during request execution. 
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More specifically, we will explain how to determine the cluster to which a 
new record belongs and how to determine the set of clusters which must be re- 
trieved in order to satisfy a query for retrieval, deletion or update. 

(A) Inserting Records into Clusters 

During the process of cluster formation described in the previous sec- 
tion, MDBS uses the record-to-clus ter algorithm repeatedly for determining 
the cluster of a record in the database. This same algorithm may now be used 
by MDBS to determine the cluster of a record for the record's insertion. In 
insertion, the cluster def inition table (CDT) is used in order to determine 
the secondary memory address (addresses) of this cluster. CDT is a table ma- 
intained by MDBS. There is an entry in this table for every cluster. Each 
entry consists of a cluster number, set of descriptor ids defining the clus- 
ter, and addresses of the records in the cluster. A sample CDT is depicted 
in Figure 11 . 

(B) Retrieving, Deleting and Updating Records from Clusters 

Let us describe how MDBS determines the set of clusters which satisfy 
the query in a retrieval, deletion or update request. Before we may do this, 
we must introduce some concepts and terminology. 

Descriptor X is defined to be less than descriptor Y, if the attributes 
in both descriptors are the same and one of the following holds. 

(1) Both descriptors are of type-A and the upper limit of descriptor X is 
lower than the lower limit of descriptor Y. 

(2) Both descriptors are of type-B and the value in descriptor X is 
smaller than the value in descriptor Y. 

(3) Descriptor X is of type-A and descriptor Y is of type-B and the upper 
limit of descriptor X is lower than the value in descriptor Y. 

(4) Descriptor X is of type-B and descriptor Y is of type-A and the value 
in descriptor X is smaller than the lower limit of descriptor Y. 

The above definition also covers the case where either X or Y is a 
type-C descriptor, since type-G descriptors are stored as type-B descriptors 
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Notes : 

(1) Clusters have unique cluster numbers. 

(2) No two clusters have a record in common. 

(3) A cluster is defined by a set of descriptors. 

(4) The keywords of the records in a cluster are 
derivable from the descriptors of the set 
defining the cluster. 

(5) Two sets of descriptors defining two clusters 
may have descriptors in common. 



Cluster Corresponding Set 

Number of Descriptor Ids 



Address of 
the Record in 
the Cluster 



Cl 

C2 

C3 



D2,D3 
D1,D3,D7 
D4 , D6 



Rl , R6 , R7 
R4 ,R8 
R2,R3 



Figure 11. A Sample of The Cluster Definition Table (CDT) 
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in MDBS. An exactly parallel description for the greater-than relation among 
descriptors may also be given. 

As an example, let us assume that we are given the descriptors D1 
(10000=<SALARY= <20000), D2 (0=*<S ALARY* <8000) , D3 ( SALARY=9000 ) and D4 
(SALARY*21000) . Thus, D3 is less than D1 ; D2 is less than D3 ; and Dl is 
less than D4. 

Using the above definition of less-than and greater-than for the des- 
criptors, we are ready to describe the algorithm for determining the corres- 
ponding set of clusters for a query in a user request. The query is assumed 
to be in disjunctive normal form, i.e., disjunction of conjunctions. The al- 
gorithm, known as the query- to-c luster algorithm , will proceed in three 
steps . 

Since a query conjunction consists of predicates, we will determine, in 
the first step, a corresponding descriptor or a corresponding set of 
descriptors for each predicate . This is done as follows. If the predicate 
in a query conjunction is an equality predicate, then the corresponding des- 
criptor is the one from which the keyword satisfying the predicate is 
derived. For example, if the predicate is (L0CATI0N=Napa) , then the keyword 
satisfying the predicate is <L0CATI0N, Napa> and the corresponding descriptor 
is (L0CATI0N=Napa) . If the predicate is either a less-than or 
less-than-or-equal-to predicate, it is first treated as an equality predicate 
and the corresponding descriptor D for that equality predicate is first de- 
termined. Then, all the descriptors less than D, along with D, form the cor- 
responding set of descriptors for the less-than or less-than-or-equal-to 
predicate. If the predicate is a greater-than or greater- than-or-equal-to 
predicate, then it is first treated as an equality predicate and the corres- 
ponding descriptor D for that equality predicate is first determined. Then, 
all the descriptors greater than D, along with D, form the corresponding set 
of descriptors for the greater-than or greater-than-or-equal-to predicate. 
Thus, we have determined a corresponding set of descriptors for a predicate. 



The above procedure is repeated for every predicate in the query 
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conjunction. Thus, we will have determined a corresponding set of descrip- 
tors for every predicate in a query conjunction. 

Our next step is to determine the corresponding set of clusters for a. 
query con junct ion , since a query consists of one or more query conjunctions. 
Let the query conjunction have p predicates. Let the set of descriptors cor- 
responding to the i-th predicate be Si. Now, form all possible groups, where 
each group consists of one descriptor from Si for i ranging from 1 to p. In 
other words, we are forming the cross-product of Si. The reason for forming 
this cross-product of p sets is because a query conjunction consists of a 
conjunction of p predicates, each of which has a corresponding set Si of des- 
criptors. Each element in this cross-product is termed a descriptor group 
which is of course a set of descriptors. Intuitively, a group defines a set 
of clusters whose records satisfy the query conjunction. 

We now consult the cluster definition table, i.e. CDT (see Figure 11 
again.) However, the definitions kept in the table may not be identical to 
the definitions of the groups. Without relating the descriptor groups with 
the descriptor sets kept in the table, we may not be able to determine the 
clusters involved. Thus, this second step includes the determination of 
whether there are descriptor sets in the table which contain a descriptor 
group. If there are such sets, then the clusters defined by the descriptor 
sets are indeed the clusters referred to by the descriptor group. 

By repeating this procedure for every descriptor group in the 
cross-product, we are able to determine the corresponding set of clusters for 
a query conjunction. The entire second step which is used to determine the 
corresponding set of clusters for a query conjunction is then repeated for 
every query conjunction in the query. Thus, we have determined a correspond- 
ing set of clusters for every query conjunction in the query. 

The final step of the algorithm determines the corresponding set of 
clusters for the query from the corresponding set of clusters for each query 
conjunction in the query. Since the query is a disjunction of conjunctions, 
the corresponding set can be simply obtained as the union of the sets of 
clusters for each query conjunction in the query. 
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3 .3 The Entire Process of Request Execution 

In this section, we discuss the entire sequence of actions performed by 
MDBS in processing the four different types of requests. We shall discuss 
each type of request, in turn. 



3.3.1 Executing an Insert Request 

The syntax of an insert request in MDBS is 

INSERT Record. 

The controller will first parse the request and determine that it is an in- 
sert request. Next, the controller will broadcast the request to all the 
backends. The backends will perform descriptor processing. At the end of 
the descriptor search phase, the single cluster to which the record to be in- 
serted is known to the backend(s) whose secondary memory (memories) has 
(have) been accommodating the cluster. The reason that more than one backend 
may be involved in accommodating the cluster in consideration is that the 
cluster being sufficiently large has been evenly distributed by the data 
placement strategy over several backends' secondary memories at the 
database-creation time. Consequently, MDBS must decide which backend's sec- 
ondary memory is to be used for accommodating the new record. By consulting 
the c luster- id- to-next-backend table (CINBT), MDBS can select the secondary 
memory of a specific backend for record insertion. The CINBT is created at 
the database-creation time by the data placement strategy. A sample CINBT is 
depicted in Figure 12. 



3.3.2 Executing a Retrieve Request 

We recall that the syntax of a retrieve request in MDBS is as follows ; 

RETRIEVE Query Target-list [By Attribute ] [WITH Pointer] . 

The controller will first parse the request and determine that it is a 
retrieve request. Next, the controller will broadcast the request to all the 
backends. The backends will perform descriptor processing and address gener- 
ation. Upon completion, each backend has a list of secondary memory 



Notes : 

(1) The number of backends in a MDBS may be 
large, say, 6. 

(2) A cluster of many records is stored in a 
specific round-robin way among the backends’ 
disk drives, 

(3) This table is kept up to date by MDBS as new 
records are inserted into the database and 
existing records are modified which result 
in changes of clusters. 
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Figure 12, The Cluster-Id-To-Next-Backend Table (CINBT) 
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addresses of the tracks which contain the relevant records. These tracks are 
accessed by the backend. The query in the request is used to select the 
records from these tracks. First, the records satisfying the query are se- 
lected. If a BY-clause is specified in the retrieve request, the selected 
records are grouped by the values of the attribute in the BY-clause. If no 
BY-clause is specified in the retrieve request, all the selected records are 
treated as a single set. Next, for each set of selected records, the values 
of all attributes in the target-list are extracted from the records of the 
set. If no aggregate operator is specified on an attribute in the 
target-list, the extracted values of the set are returned to the controller. 
If an aggregate operator is specified on an attribute in the target-list, 
some computation is performed on all the attribute values in the records of 
the set and the results are returned to the controller. For example, to com- 
pute the average salary, each backend computes the sum of all the salaries in 
its set of retrieved records. It then returns this sum and a count of the 
number of records in the set to the controller. The controller combines the 
sums and counts from all the backends to give the average salary, which is 
returned to the user. This completes the actions performed by a backend on 
each set of selected records. If a WITH-clause is specified in the retrieve 
request, the secondary memory addresses of all selected records must also be 
sent to the controller by each backend. 

The controller will wait for responses from all the backends. Upon re- 
ceiving all the responses (i.e., attribute values, aggregate values or ad- 
dresses) from all backends, the controller wiil forward these responses to 
the user that issued the retrieve request. This completes the execution of 
the retrieve request. 



3.3.3 Executing a Delete Request 

As we recall, the syntax of a delete request is 

DELETE Query 

The execution of this request in MDBS is similar to the execution of a 
retrieve request. The controller will first parse the request and determine 
that it is a delete request. Next, the controller will broadcast the request 
to all backends. The backends will perform descriptor processing and address 
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generation. Upon completion, each backend has a list of secondary memory ad- 
dresses of tracks which contain relevant records. Records of these tracks 
are retrieved from the secondary memory by respective backends. The query in 
the delete request is used to select the records which are to be deleted. 
The selected records are then marked for deletion. The track space occupied 
by the marked records is not immediately recovered. Such recovery of space 
will be done during database reorganization time. After the records are 
marked, the marked records are written back to the same tracks by each back- 
end. If all the records in a track are marked for deletion, the address of 
this track is removed from all entries in which it appears in the cluster de- 
finition table (CDT). Finally, each backend will send an acknowledgement to 
the controller to indicate that it has finished executing the delete request. 
Upon receiving the acknowledgements from all the backends, the controller 
will inform the user or user program that the delete request has successfully 
been completed. 



3.3.4 Executing an Update Request 

The syntax of an update request in MDBS is as follows : 

UPDATE Query Modifier. 

We recall that the modifier in an update request specifies the new value to 
be taken by the attribute being modified and that it may be one of the types 
described below. 

Type-0 : <attribute = constant> 

Type-I : <attribute = f (attribute) > 

Type-II : <attribute = f ( attribute-1 ) > 

Type-III : <attribute = f (attribute-1 ) of Query> 

Type-IV : <attribute = f( attribute-1 ) of Pointer> 

An update request containing a modifier of types 0 , I or II is broadcast 
by the controller to all the backends. The backends will perform descriptor 
processing and address generation. Afterwards, each backend has a list of 
secondary memory addresses of the tracks containing the relevant records. 
These tracks are accessed by respective backends and the records satisfying 
the query are selected from these tracks. These are the records being modi- 
fied. 
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Each of these records is changed according to the modifier in the update 
request • If the modifier is of type-0, the new value is provided in the mod- 
ifier* If the modifier is of type-I, the new value is computed as a function 
(specified in the modifier) of the value of the same attribute. Finally, if 
the modifier is of type-II, i.e. of the form <attribute = f(attribute-l)>, 
the new value is computed as a function f of the value of the attribute-1 in 
that record. 

Due to its change in attribute values, an updated record may remain in 
the same cluster to which it (more precisely, pre-updated version) belonged 
or it may now belong to a different cluster. In the latter case, a record is 
said to change cluster . Recall that a cluster is a group of records such 
that every record in the cluster is derived from the same set of descriptors. 
Thus, an updated record will belong to a different cluster only if the set of 
descriptors from which it is derived is different from the set o i descriptors 
from which the pre-updated version was derived. If the attribute* being modi- 
fied in an updated record is not a directory attribute, the upcated record 
continues to be derived from the same set of descriptors, since only directo- 
ry attributes affect the descriptors. Hence, the updated record does not 
change cluster. If the attribute being modified is a directory attribute, an 
updated record may change cluster. If an updated record changes cluster, the 
pre-updated record is marked for deletion and the updated recorc is inserted 
in the appropriate cluster. 

Finally, each backend will send an acknowledgement to the controller to 
indicate that it has finished processing the update request. When it has re- 
ceived acknowledgements from all backends, the controller will return a mes- 
sage to the user to signal successful completion of the update request. This 
completes the processing of an update request containing modifiers of types 
0, I or II. 

Now, let us describe the execution of an update request containing a 
type-III or type-IV modifier. Recall that these modifiers have the form 
<attribute =* f( attribute-1 ) of Query> and <attribute = f (attribute-1 ) of 
Pointer>. Thus, in this case, another record must first be retrieved by MDBS 
on the basis of a user-provided query or pointer. After the record is 
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retrieved, the controller will extract the attribute-1 value v from the re- 
trieved record. It will then compute the function f (specified in the 
type-III or type-IV modifier) on the value v and thus obtain a new value v'. 
The controller will then form a type-0 modifier of the form 

<attribute = v'> 

where attribute is the one that appeared to the left of the equality sign in 
the type-III or type-IV modifier. The original type-III or type-IV modifier 
in the update request is now replaced with this newly created type-0 modif- 
ier. In other words, MDBS converts an update request containing a type-III 
or type-IV modifier to an update request containing a type-0 modifier. This 
update request containing a type-0 modifier may now be executed in the same 
manner described previously. 



3 .4 Directory Management 

In this section, we describe the detailed design and implementation of 
directory management in MDBS-I. 



3.4.1 The Input: Non- Insert Requests and Insert Requests 

The input to directory management is either the record part of an insert 
request or the query part of a retrieve, delete, or update request. The 
three non-insert request types, namely, retrieve, delete and update, require 
the same directory management. However, the insert request type requires a 
different directory management. Thus we will describe directory management 
in terms of two categories: non- inserts and inserts. 

We recall that the directory management in MDBS-I consists of three 
phases. In the first phase, MDBS determines the corresponding descriptors 
either for each predicate of a query in the case of a non-insert request or 
for each keyword of a record in the case of an insert request. In the second 
phase, MDBS determines either the corresponding set of clusters in the case 
of a non-insert request or the corresponding single cluster or a new cluster 
in the case of an insert request. In the third phase, MDBS determines either 
the addresses of clusters in the case of a non-insert request or a single ad- 
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dress for inserting the record in the case of an insert request. (See Figure 
2 again.) The following tables are used in the three phases for processing 
either non- insert or insert requests. 



(A) Four Directory Tables: The Descriptor-to-Descriptor-Id Table (DDIT), 

The Attribute Table (AT), The Cluster-Definition Table (CDT) and 
The Cluster-Id-to-Next-Backend Table (CINBT) 

These tables are an integrated part of the directory management. 
Logically, they are defined as follows: 

All the descriptors defined by the database creator are stored in the 
deacriptor-to-descriptor-id table (DDIT). There is a descriptor id associat- 
ed with each descriptor. A sample DDIT is depicted in Figure 13. 

There is an entry in the attribute table (AT) for every directory attri- 
bute. A pointer to the DDIT is stored with each directory attribute. The 
pointer points to the first descriptor whose attribute is identical to the 
corresponding directory attribute. A sample AT is depicted in Figure 14. 
Also shown in the figure is the DDIT of Figure 13. By showing these two 
tables together, we can easily depict the pointers of AT. 

The cluster-definition table (CDT) is described in Section 3.2.2. A 
sample CDT is also depicted earlier in Figure 11, so we do not repeat the 
figure here. However, we do repeat the definition here. There is an entry 
in this table for every cluster. Each entry consists of the cluster number, 
the set of descriptor ids whose descriptors define the cluster, and addresses 
of the records in the cluster. 

The cluster- id-to-next-backend table (CINBT) is also depicted earlier in 
Figure 12. A backend for record insertion is chosen on the basis of this 
table. 
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Notes : 

(1) Descriptors are provided by the database creator. 

(2) A set of descriptors defines a cluster. 

(3) Clusters are system entities which are 'transparent' 
to the user. 
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Figure 13. 



The Descriotor-To-Descriptor-Id Table (DDIT) 
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Figure 14. The Attribute Table (AT) and 
its Relationship to DDIT 
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(B) Three Phases of Processing: Descriptor Search, Cluster Search and 
Address Generation 

As described in Chapter 1, directory management has three phases. In 
the first phase, both AT and DDIT are searched to determine the corresponding 
descriptors either for each predicate of a query in the case of a non-insert 
request or for each keyword of a record in the case of an insert request. 
This is the descriptor search phase . In the second phase, the CDT is 
searched. For descriptors produced from the previous phase, either the cor- 
responding single cluster in the case of an insert request or the correspond- 
ing set of clusters in the case of a non-insert request is determined. This 
is the cluster search phase . By searching the same CDT, the addresses of 
clusters can be found in the third phase. This is the address generation 
phase . 



(C) The Choice of a Processing Strategy for the Controller and the Backends 

In previous discussions, we make no distinction whether the three phases 
are carried out in a single computer (i.e., either the controller or one of 
the backends) or in multiple computers (a controller and several backends). 
In [Hsia81a] , six different strategies for carrying out the descriptor search 
phase in the multiple backends and one strategy for carrying out the descrip- 
tor search phase in the controller are examined. There are also two strateg- 
ies for carrying out the cluster search and address generation phases: one 
in the controller and the other in the backends. 

If we are to achieve an ideal system in which the response time is in- 
versely proportional to the number of backends, we need to distribute the di- 
rectory management work among the backends. By carrying out the directory 
management in the backends, MDBS may be alleviated from the controller limi- 
tation problem as suggested in [Hsia81a] . 

In the following, we describe those three strategies that distribute the 
work among the backends and utilize parallel processing by the backends. All 
three strategies carry out the cluster search phase and the address genera- 
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tion phase in all the backends. By carrying out these two phases in the 
backends, each backend would need to generate only those secondary memory ad- 
dresses associated with that backend. On the other hand, if the addresses 
were to be generated by the controller, the controller would need to generate 
all the relevant secondary memory addresses associated with all the backends. 
Thus, the former case distributes address generation work among the backends; 
the latter case does not and concentrates all the work in the controller. 

(1) The Fully-Duplicated Strategy 

In this strategy, AT and DDIT are fully duplicated in all the backends. 
However, CDT is not duplicated. Instead, only the portion of CDT which is 
relevant to those clusters stored in the backend is placed in that backend. 
The descriptor search work is distributed among the backends. More specifi- 
cally, if there are n backends in MDBS and a query contains x predicates, 
each backend will perform descriptor search, by using AT and DDIT, on x/n 
predicates and generate x/n corresponding descriptor sets which will, in 
turn, be communicated to all other backends. Each backend then performs, 
by using its portion of CDT, the cluster search phase and the address 
generation phase. 

(2) The Descriptors-Division-Within-Attribute Strategy 

In this strategy, AT is duplicated in all the backends. DDIT and CDT 
are not duplicated. If there are i descriptors on each directory attribute, 
each backend will maintain for each attribute i/n descriptors. Each backend 
performs descriptor search on all the predicates to generate part of corres- 
ponding descriptor sets. After each backend obtains some results, they ex- 
change their results. Then, each backend proceeds with its own cluster 
search phase and address generation phase. 

(3) The Fully-Replicated Strategy 

In this strategy as in strategy 1, AT and DDIT are duplicated in all the 
backends. CDT is not duplicated. However, unlike strategy 1, each backend 
will work on the entire query during the descriptor search phase, instead of 
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x/n predicates of the query. The advantage of letting each backend do the 
descriptor search on all predicates is that exchanges of descriptors among 
backends are unnecessary in this strategy because each backend has all the 
needed descriptors. After completing the descriptor search, each backend 
does its cluster search phase and address generation phase. 

According to the analyses in [Hsia81a], strategy 2 has a poor 
average-and-worst case performance for typical number of attributes and typi- 
cal number of descriptors per attribute; strategy 3 replicates the descrip- 
tor search phase; strategy 1 does not have the shortcomings of the other two 
strategies. Consequently, we choose to design and implement strategy 1 for 
directory management. In addition to utilizing strategy 1 for parallel pro- 
cessing of the three directory management phases for non-insert requests and 
the first two phases for insert requests by the backends, we choose the 
strategy of placing the CINBT entirely in the controller to be used only by 
the controller. For insert requests, the controller consults this table to 
select a backend for record insertion. Thus, records in a cluster can be 
distributed across the backends in order to achieve maximum parallel process- 
ing by the backends for subsequent requests. 



3.4.2 The Use of Abstractions and Tables for Implementation 

In this section, we detail the first implementation of the directory 
management of MDBS-I. As outlined in Chapter 1, this implementation does not 
provide concurrency control and access control. It maintains the directory 
information in the main memory only. In this implementation, cluster search 
and address generation are carried out together. Thus, in the sequel, we 
refer to descriptor search as phase and to cluster search and address gen- 
eration as phase II . The input to phase I is either the record part of an 
insert request or the query part of a non-insert request, and the output is a 
set of descriptor ids corresponding to the descriptors derived from either 
the keywords of the record or the predicates of the query in the user re- 
quest. Phase II makes use of these descriptor ids to come up with the cor- 
responding cluster ids and, in turn, the set of secondary memory addresses 
for I/O operations. 
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(A) Two Data Abstractions for Descriptor Search 

In compliance with the design decision of treating data structures and 
services, which are necessary in the phase I processing, as abstractions, 
both AT and DDIT tables are enclosed in data abstractions. For AT, the ab- 
straction is the attribute-table module (ATM), and for DDIT it is the 
descriptor- to-de scrip tor- id- table module (DDITM). This approach requires ac- 
cess to these tables via explicit calls to procedures that operate on the 
tables. 

(B) The Difference Between Descriptor Sets and Descriptor Groups 

We now make the distinction between descriptor sets and descriptor 
groups by means of an example. Let us assume that MDBS has the following 
DDIT and CDT for the employee file: 



For this file, the descriptor set for cluster Cl, for example, is 
{ 20000=<SALARY= <30000 , 20=<AGE=<30, SEX=M } 

Now, consider the following retrieval request. 

RETRIEVE (FILE= Employee ) and (SALARY >=20 000) and (AGE=<50) (NAME) 

In referring to DDIT, we see that the predicates of the requests have the 
following derivability . The predicate (SALARY>=20000) is derivable from ei- 
ther the descriptor (20000=<SALARY=<30000) or the descriptor 
(40000=<SALARY= <60000) ; and the predicate (AGE=<50) is derivable from either 
the descriptor (20=<AGE=<30) or the descriptor (31**<AGE“<50) . Using their 
descriptor ids instead of the descriptors themselves, we learn that the query 
of the request is derivable from the following 

(D2 or D3) and (D4 or D5) 
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So, for the employee file MDBS should look for clusters whose descriptor- id 
sets contain {D2,D4} or {D2,D5} or {D3,D4} or {D3,D5}. 

To distinguish sets in CDT from those derived from the predicates, we 
term the aforementioned four collections of descriptor ids the descriptor- id 
groups and their corresponding descriptor collections the descriptor groups . 
For {D2,D4}, for example, the descriptor group is 

{ 20000=<SALARY=*<30000 , 20=<AGE=<30 > 

Thus, descriptor sets are associated with clusters and created either at the 
database creation time or when there is a new cluster, whereas descriptor 
groups are obtained from the query part or the record part of the request and 
they change from request to request. For the above retrieval request, the 
descriptor- id set of cluster Cl contains the descriptor-id group {D2,D4} and 
the descriptor-id set of cluster C4 contains the descriptor- id group {D3,D5}. 
Thus, the records in these clusters, i.e., {Rl,R5,R8}, are retrieved, select- 
ed and the NAME values in the selected records are returned to the user. 

Phase II needs descriptor- id groups to come up with cluster numbers and, 
in turn, addresses of the records in those clusters. In the next section, we 
describe how MDBS-I generates descriptor-id groups. 

(C) The Generation of the Descriptor- id Groups for a Request 

In order to generate the descriptor-id groups readily, we introduce the 
encoding scheme of location parameter . From the query part of a non-insert 
request, the scheme extracts the conjunctions of the query and numbers them 
consecutively. Each predicate is then identified by its conjunction number 
followed by its relative position in that conjunction. For example, in the 
following query part of a non-insert request 

((DEPT=Shoe) and ( SALARY >10000 ) ) or ( (DEPT=Toy ) and ( SALARY <15000 ) ) 
the predicate (DEPT=Shoe) has the location parameter 11, since it is the 
first predicate of the first conjunction. Thus, for the above query the 
predicates have their location parameters represented on the left hand side: 

11 DEPT=Shoe 

12 SALARY>10000 

21 DEPT=Toy 

22 SALARY <15000 
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In the case of insert requests, the keywords of the record are treated 
as one conjunction, so the first number of the location parameter is always 
1. Furthermore, the second number of the location parameter is not the rela- 
tive predicate number, but the relative keyword number since the record to be 
inserted consists of keywords instead of predicates. 

(D) A Service Abstraction for Passing Descriptor- id Groups to Cluster 
Search 

The output of phase I, the corresponding descriptor ids, are the input 
to phase II. Since the format of the input to phase II depends on the clus- 
ter search strategy on CDT employed in that phase, format and strategy 
changes in one of the phases can affect the other phase. In order to make 
each phase immune to the changes made in tie other phase, a service abstrac- 
tion is placed between the two phases. This abstraction, known as directory 
interface (DIRINT), accepts the output of phase I and produces the input for 
phase II. All the abstractions are documented in the appendicies. 

For the output of phase I, DIRINT produces a table called request 
descriptor- id table (RDIT), given a query part or a record part of the 
request. Each entry of the table is an ordered pair of location parameters 
and descriptor ids. Thus, an entry of RDIT indicates the id of a descriptor 
derived from the predicate or the keyword and is uniquely identified by the 
location parameter in the entry. If multiple descriptors are derived from a 
predicate, then there are multiple entries in RDIT, one for each such des- 
criptor. In this case, RDIT contains the descriptor ids of all the descrip- 
tors derived from the predicate. In Figure 15, we depict a sample of RDIT. 

(E) A Data Abstraction and Three Directory Tables for Cluster Search and 
Address Generation 

In phase II, MDBS-I makes use of three tables : the descriptor table, 
the descriptor-to-cluster map, and the extended cluster definition table. 
Each entry of the descriptor table (DT) contains the id of a descriptor that 
has been defined for a given database, the number of clusters defined for the 
descriptor, and a pointer to the first cluster of those defined for the des- 
criptor. 
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A Sample Request-Descriptor-Id Table (RDIT) 
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The de scr ip tor-to-c luster map (DTCM) serves the purpose of mapping des- 
criptors to clusters. It is maintained in such a way that all the DTCM en- 
tries for a descriptor are linked together. Each DTCM entry, then, points to 
a cluster definition whose descriptor- id set contains the descriptor id of 
this descriptor. 

The extended cluster definition table (ECDT) contains more information 
about each cluster than CDT, which was discussed in Section 3.2.2 and 
depicted in Figure 11. Each entry consists of the cluster number of a cluster, 
number of descriptors defining the cluster, a pointer to the list of descrip- 
tor ids whose descriptors define the cluster, and a pointer to the list of 
addresses of records belonging to this cluster. 

All of these tables are enclosed within a data abstraction called 
cluster-def init ion- table module (CDTM). A sample of the tables is depicted 
in Figure 16. 



(F) A Typical Sequence of Directory Management Actions for an Insert Request 

When there is a request for inserting a record, the following directory 
management takes place in MDBS-I. An equality predicate is constructed for 
each keyword of the record. For example, the keyword <NAME=Kerr> becomes the 
predicate (NAME=Kerr ) . Then, for each predicate, the descriptor id of the 
descriptor derived from the predicate is found by using AT and DDIT. This 
process is repeated for every keyword of the record. All the descriptor ids 
are then put into RDIT via the service abstraction DIRINT. 

The descriptor- id group corresponding to the record being inserted is 
obtained from RDIT via DIRINT. We note that there is only one descriptor- id 
group because each of the equality predicates constructed from the keywords 
is derived from at most one descriptor. Among the descriptor ids in the des- 
criptor-id group, the id of the descriptor that participates in defining the 
smallest number of clusters is chosen by using DT. Let us call this descrip- 
tor id Dm. By using DT, DTCM, and ECDT, all the clusters whose descriptor- id 
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Figure 16. An Example of DT, DTCM and ECDT 
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sets contain Dm are examined. If there is a cluster whose descriptor- id set 
matches the descriptor- id group, then the record being inserted belongs to 
the cluster identified. We note again that for an insert request, the des- 
criptor-id set must match the descriptor- id group so that the record may be 
inserted in the cluster whose records are derived from the same set of des- 
criptors. 

(G) A Typical Directory Management Sequence of Actions for a Non-insert Re- 
quest 



When there is a non- insert request, the following directory management 
takes place in MDBS-I. For each predicate in the query part of the request, 
all the descriptor ids of the descriptors derived from the predicate are 
found by using AT and DDIT. All the descriptor ids are put into RDIT via the 
service abstraction DIRINT. 

Each of the descriptor- id groups corresponding to the query is obtained 
from RDIT via DIRINT. We note that there may be more than one descriptor- id 
group because each predicate of the query may be derived from more than one 
descriptor. See the example in part (B) of this section. Among the descrip- 
tor ids in the descriptor- id group, the id whose descriptor participates in 
defining the smallest number of clusters is chosen by using DT. This des- 
criptor id is designated with Dm. By using DT, DTCM, and ECDT, all the clus- 
ters whose descriptor- id sets contain Dm are examined. The clusters whose 
descriptor-id sets contain the descriptor-id group are therefore found. This 
process is repeated for each descriptor- id group. We note that for a 
non- insert request, the descriptor-id set does not have to be identical to 
the descriptor- id group as long as the set contains the group. Then, the ad- 
dresses of the records in the clusters just found are obtained. 
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4.0 LOADING THE DATABASE 



In MDBS, as in other database systems, a database creator may want to 
load a database with data that exists elsewhere. Such data may reside as 
files on magnetic tapes, for example. The database creator can use a 
software tool, provided by MDBS and called database load , to specify the 
source data files and to create a database. In this section, we describe the 
design of this tool. The implementation details for the version used in 
MDBS- I are placed in Appendix C. 



4.1 T hree Directory Tables for Loading 

A user of the database- load subsystem may want to consolidate several 
related files into one database. In this case, there will be one attribute 
table (AT), one descripter-to-descripter-id table (DDIT) and one 
extended-cluster-def inition table (ECDT) for the database. Alternatively, 
the user may want each file to become a separate database. In this case 
there will be a separate AT, DDIT and ECDT for each database. 



4 . 2 Fcmr Phases of Database Loading 

The database load subsystem, as seen by the user and shown in Figure 17, 
executes in four logical phases. First, the user specifies various charac- 
teristics of the existing source files and of the database to be created and 
loaded. Then the data is read from user supplied source-files and prepared 
for loading. Next, the data is grouped into clusters. After clustering, the 
data is distributed to the backends. The programs in the database- load sub- 
system run mainly in the controller. However, the database-load subsystem 
does include the distribution of records and directory tables to the back- 
ends . 



4.2.1 The Database Definition Phase 

Before the source files are read, two tasks are accomplished in this 
phase. The first task is descriptor def inition . In this task the user spec- 
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Figure 17. Four Phases of Database Loading 
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ifies all directory attributes for the database. Then the user specifies the 
upper and lower bounds for each type-A descriptor and the value for each 
type-B descriptor. As these values are given, they are checked against pre- 
viously defined descriptors to make sure there is no overlap of the ranges 
and values specified. In other words, the rules governing the proper use of 
descriptors given in Section 3.1.1 are enforced by the database load subsys- 
tem. 



Only the attribute names of type-C descriptors are specified at this 
time. The type-C sub-descriptors of the type-C descriptors will be formed 
later when the actual source records are processed. As described above, 
clusters may be formed for one file at a time or for all files at the same 
time. When clusters are formed for all the files, the descriptor definition 
procedure will be invoked only once per database; when clusters are formed 
for separate files, the descriptor definition procedure will be invoked once 
per file. 

During execution of this task the attribute table (AT) will be built 
using the data abstraction of the attribute table module (ATM). In addition, 
descriptor-to-descriptor-id table (DDIT) entries for all type-A and type-B 
descriptors will be established using the data abstraction of the 
descriptor-to-descriptor-id module (DDITM) . The ATM and DDITM are described 
in Section 3.4.2. The type-C sub-descriptors formed from the type-C descrip- 
tor entries will be added later as the source-data is examined in the next 
phase. 

The second task is def inition of attribute characteristics for each 
file. A file is defined to include records of one format only. A record 
template will be built for each file. It will include an entry for each at- 
tribute. Each entry will include the attribute name, data-type (e.g., in- 
teger), length, etc. For each source file, the user must supply the names of 
all the attributes in the records. Then for each attribute, the user must 
define the data-type. If the data-type is character string, the user must 
specify whether the strings are of fixed or variable length. The user must 
also specify the minimum and maximum values of each integer type as well as 
the minimum and maximum lengths of each character string type. All of the 
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values specified are stored in the record template by the record template 
module , 

4.2.2 The Record Preparation Phase 

This phase includes the conversion of source records into the format re- 
quired for internal storage in MDBS. As each record is examined the set of 
descriptors from which the keywords of the record can be derived will be de- 
termined using the ATM. and DDITM abstractions. In conjunction with this 
task, the type-C sub-descriptor entries formed from each type-C descriptor 
will be added to DDIT. Additionally, the formatted record will be appended 
to the descriptor-id set corresponding to the descriptors derived from the 
record. Both the descriptor- id set and their appended records are the input 
to the next phase. At the end of this phase the attribute table (AT) and the 
descripter-to-descripter-*id table (DDIT) are complete. 



4.2.3 The Record Clustering Phase 

This phase separates the records into clusters. As is described in 
Chapters 1 and 3, all the records in a cluster are derived from the same set 
of descriptors. Thus, separating the records into clusters is accomplished 
by sorting the records according to the descriptor ids appended to each re- 
cord in the previous phase. A sort package is used for this phase. 



4.2.4 The Record and Table Distribution Phase 

The last phase is distribution of data to the backends. The records are 
distributed to the backends one cluster at a time. For each cluster, the 
descriptors defining the cluster are broadcast to the backends so that the 
cluster can be defined in the extended-cluster-definition table (ECDT) using 
procedures in the cluster-def inition-table module (CDTM) that was described 
in Section 3.4.2. Typically, a cluster will contain many records. Within a 
cluster, the records are spread across the backends. Sufficient records to 
fill one disk track are sent to one backend. Then sufficient records to fill 
a second track are sent to second backend. This procedure continues until 
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all the records have been distributed. It should be noted, of course, that 
the last group of records may not fill a track. The information about the 
last backend and amount of track space available is kept in the 
cluster-id-to-next-backend table (CINBT) so that the next records to be in- 
serted into that cluster can be stored in that partially filled track. 

In order to distribute the data evenly across all the backends, the 
first backend to receive records is chosen randomly. Then the choice of 
backends goes in sequence* This distribution strategy was called the 
track-splitting-with- random-placement strategy in [HsiaSla] . 

This phase also distributes the system tables to the backends. The at- 
tribute table (AT) and the descriptor- to-descriptor- id table (DDIT) are com- 
plete after the record preparation phase. Portions of the 
extended-cluster-definition table (ECDT) will be built at each backend. The 
portion of ECDT at a backend will contain only the addresses of the records 
stored in that backend. 



4.3 The Implementation Status 

The complete design of the database load subsystem for MDBS-I is includ- 
ed in Appendix C. Coding has been completed for almost all the procedures. 
Testing is completed for about half of the procedures. As described in Ap- 
pendix A, Appendix C shows exactly which procedures have been completed and 
which procedures have been tested. This information is also included in Sec- 
tion 2.2.4 as Figure 10. 
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5.0 THE TEST FILE GENERATION 

Program-generated test data will be used for two purposes. First, we 
will be testing each version of MDBS to see that it works correctly. Second, 
as described in Chapter 1, the initial performance evaluation experiments 
will use program-generated data. 



5*1 Three Types of Test Data 

The test data to be generated will be organized into files. Thus we 
designate the program the F ile Generaton P ackage. The characteristics of a 
file are specified by the user of the file generation package. Each file has 
a file-name and a certain number-of-records . Each record in a database is 
composed of a set of attribute-value pairs. For initial testing purposes, we 
have decided to require that all records in a file have the same attributes. 
Thus each record has a fixed number-of-at tributes . The values of an attri- 
bute in different records are restricted to a particular data-type . The pos- 
sible data-types are integer , string (i.e., character-strings) and float 
(i.e., floating-point numbers). 

In addition to specifying the format of data to be generated, we must 
also specify how particular values of each record are to be generated. The 
first means of generating a value is to use a routine that generates a 
random- integer . random-string or random-float value. These routines make use 
of random-number generators to arrive at a value from some particular distri- 
butions of potential values. Thus the values of the first attribute might be 
a randomly chosen integer between 100 and 500. The value of the second at- 
tribute might be a random character string. The value of the third attribute 
might be a random floating-point number. 



5 .2 Random Test Data vs . Realistic Test Data 

The data generated as just described is fine for program testing and in- 
itial performance evaluation. However, since each value is generated random- 
ly by a program, the test data is not selectd by the user. In order to gen- 
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erate realistic data for the user, a second form of value generation is also 
supplied. In this form, a user may specify the data sets, say, a set of 
names. Then the user can direct the file generation package to select values 
for an attribute from one of these predefined sets. Once a set is defined, 
its values are saved for later use. 



5.3 Steps in Test File Generation 

The file generation package works in three steps. First, the user de- 
fines the form of the file to be generated, i.e., the number-of-records , 
number-of-attributes and the characteristics of each attribute. Then initial 
processing of test data sets follows. If the user wants to use sets that al- 
ready exist, then the data of those sets are loaded into the main memory. If 
there are new sets that the user wants to specify then the program prompts 
for the values of the data of the sets, which are then loaded into the main 
memory and also stored in the secondary memory for later use. After all the 
sets are loaded into the main memory, the final step is the actual generation 
of the records. 



5 *4 The Relationship of the Package to Testing Strategies and 

Performance Evaluation Experiments 

The first use for the file generation package is for the black-box test- 
ing of MDBS as described in Section 2.2.3. In particular, the system testers 
will be able to generate easily any form of test databases that they require. 
They will then only have to generate sample requests in order to run tests to 
see if MDBS is working correctly. 

The second use for the file generation package is for the type of per- 
formance evaluation experiments using program-generated data as described in 
Section 1.2.1(A). For these tests, the experimenters can vary the form of 
the database by varying the distribution of different types of data. They 
can see how MDBS performs on different types of queries and using different 
numbers of backends. 
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5 *5 Current Status of the Package 

The file generation package is now working in its initial form and is 
ready to be used for black-box testing. The package handles integer and 
string data types. The subsystem which handles data sets is finished. The 
routines to generate data values from a uniform distribution are complete. 
The data type, float, must still be added. Routines to generate data values 
from distributions other than uniform are not needed for black-box testing. 
They will be added if it is determined that they are needed for performance 
evaluation experiments . 

The design of the complete file generation package is included in Appen- 
dix B. The first version of the operating procedures manual (OPM) is also 
completed . 
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6.0 PLANS FOR THE NEXT MDBS VERSIONS 

As we recall from Chapter 1, MDBS-I will not provide concurrency con- 
trol. We also note that MDBS-I does not provide a secondary-memory-based di- 
rectory management. Instead, MDBS-I utilizes the main memory for directory 
management. We plan, therefore, to implement a concurrency control mechanism 
in MDBS-II and an efficient directory management utilizing the secondary mem- 
ory for MDBS-IV. 

The basic design of the concurrency control mechanism is included in 
[Hsia81b]. We will not elaborate on the basic design here except to note 
that the detailed design eliminates any need for communication among the 
backends other than the required exchange of descriptors during the descrip- 
tor search phase of directory management. In this section, we will discuss 
one of most important system issues which must be resolved before we can im- 
plement the concurrency control mechanism, i.e., how will MDBS interface with 
the operating systems at the controller and at the backends? On the other 
hand, we will not discuss various approaches toward an efficient directory 
management based on the secondary memory since our preliminary studies on the 
approaches are still inconclusive. 



6 . 1 Interfacing with Operating Systems 

Most operating systems provide mechanisms for allowing concurrent execu- 
tion of different processes. These mechanisms include primitives for commun- 
ication and synchronization among processes. Process communication and syn- 
chronization primitives of the operating system are the basic system primi- 
tives that MDBS-II may utilize for concurrent executions of multiple re- 
quests . 

6 .2 Two Kinds of Interfacing Approaches 

Operating systems have been characterized as either message-oriented or 
procedure-oriented, depending on how they implement the notions of process 
and synchronization [Laue79]. We could use either approach for implementing 
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the concurrency control mechanism of MDBS-II. 

Using a message-oriented operating system, there would be a fixed number 
of processes (one per MDBS activity). Directory management, for example, 
might be an activity, which could be implemented as a process. 
Synchronization is implemented by passing messages among processes. There is 
a relatively limited amount of direct sharing of data in the memory among 
processes. Processes for each activity are created when MDBS is started up. 
They are only deleted when MDBS is shut down. 

Using a procedure-oriented operating system, there would be a varying 
number of processes (one process per user). Synchronization is implemented by 
direct sharing and locking of common data in the main memory. Processes are 
rapidly created and deleted. 

In the following sections, we describe how each of the two kinds of op- 
erating system can be used for supporting concurrency control in MDBS-II. In 
order to simplify the discussion, we restrict the types of requests that are 
allowed. These restrictions mean that no changes to the directory informa- 
tion will be made. *To show the applicability of the approaches to MDBS, we 
give a simplified description of the operation of MDBS using each approach. 
The descriptions are based on the following assumptions: 

(1) There are n users. 

(2) Each user has submitted one or more requests so that there are k ac- 
tive requests in total. The requests arrive at the controller at 
times tl, t2, . . . tk. 

(3) Grouping of requests into transactions is not allowed. 

(4) Only retrieve and update requests are allowed. Records being modi- 
fied in an update request will not change cluster. Thus, there is no 
need for concurrency control in directory management since directory 
information will not change. 

(5) Concurrency control is done at the cluster level. For example, using 
a procedure-oriented operating system, locking is on clusters. 

(6) The scheduling of requests that reference common clusters is done 
using the concurrency control mechanism described in [Hsia81b]. 
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6.2*1 Concurrency Control in MDBS-II using Message-oriented Approach 

The interactions are shown in Figure 18. Requests are received at the 
controller and then broadcast to the backends. At each backend, a request is 
first input to the "directory management" process. This process determines 
the set of clusters needed by the request. The request and the cluster 
numbers of clusters determined are sent to the "scheduler” process. This 
process keeps a queue of requests waiting to be processed and a list of clus- 
ter numbers of clusters being accessed. This process takes a request off the 
queue if it can be scheduled, updates the list, and sends the request to the 
"request execution" process. The "request execution” process carries out the 
request, forwards the results to the controller, and sends a message back to 
the "scheduler" process indicating that the request is completed. When the 
"scheduler" process receives the message from the "request execution” pro- 
cess, it updates the list, releasing all those clusters accessed by the 
completed request. 



6.2. 2 Concurrency Control in MDBS-II using Procedure-oriented Approach 

The interactions are shown in Figure 19. In this approach each backend 
maintains a process for each active user. Thus, the number of "user” 
processes in MDBS-II is the product of the number of backends and the number 
of MDBS-II users. All "user” processes at one backend share a "cluster-lock" 
table. Thus, there are as many "cluster-lock” tables as there are backends. 

In carrying out a user request, the "user” process at each backend con- 
sults the "cluster-lock" table at that backend. If the needed clusters are 
not locked, then they are locked by the process. Furthermore, the request is 
carried out by the process. Upon completion of the request, the process un- 
locks the clusters from the "cluster-lock” table. If a needed cluster is 
locked, then the process must wait until the cluster is unlocked. We note 
that there is no explicit scheduler or request queue. Instead, requests are 
carried out on the availability of the needed clusters as reflected from 
their state in the "cluster-lock” table. 



Most database system implementations have used the procedure-oriented 
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Notes : 

(1) There are k requests 
r^, arriving at 
different times, 
i.e», t j , • • • 9 t^ . 

(2) All k requests are 
broadcast to each 
backend . 

(3) There are 3 
processes in a 
backend. 





, Controller 





(4) Interprocess 
communicat ions 
among the 
processes for 
exchanging the 
descriptors in 
the descriptor 
search phase are 
not shown here. 




Figure 18. The Message-Oriented Design for 
Concurrency Control in MDBS-TI 
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Notes : 

(1) There are n users. 

(2) There are m 
backends. 

(3) There are ra x n 
user processes. 

(4) There are m 
cluster-lock 
tables . 





(5) Interprocess 
communication 
among the 
processes for 
exchanging 
descriptors in 
the descriptor 
search phase 
are not shown 
here. 




Figure 19. The P rocedure-Oriented Design for 
Concurrency Control In MDBS-II 
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approach. However, it has been suggested that a message-oriented approach 
might be more efficient [Ston81]. We plan to investigate both approaches 
more fully before choosing one for our implementation. 
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APPENDIX A 

HOW TO READ AND FOLLOW THE PROGRAM SPECIFICATIONS 



In Appendices B, C and D, a large number of MDBS- I programs are des- 
cribed and specified. These programs represent those parts of MDBS-I that 
have been designed and implemented at this time. 

A.l Parts within an Appendix 

Each appendix begins with an introduction which outlines the major com- 
ponents of the design. For example, the design of test file generation, pre- 
sented in Appendix B, consists of two major components: one for generating 
random test data strings and the other for generating realistic test data 
sets. Accordingly, each major component is described and specified in a sep- 
erate part of the appendix. Thus Appendix B has Part I and Part II. 



A. 2 The Format of a. Part 

In each part, we provide the following documentation elements: 

(1) Title of the part, 

(2) Name of the design, 

(3) Name of the designer, 

(4) Date the design was first submitted, 

(5) Dates of design modifications, 

(6) Statements of the design purpose, and of the input and output re- 
quirements , 
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(7) Formal specifications of the input and output, if necessary, 

(8) Procedure names used in the design, 

(9) Data structures used in the design, 

(10) Program specification of the design. 



A. 3 Documentation Techniques for the Part 

In the previous section, we list 1:he various documentation elements. 
They are used to describe a design. Documentation elements 1 through 5 are 
written in English phrases. Document element 6 is written in prose. On the 
other hand, document elements 7 througli 10 can be expressed more effectively 
using other means as described in Chapter 2. Specifically, we use 
Backus-Naur form (BNF) for writing the specifications in document element 7. 

The procedure names of document element 8 are shown in a program hierar- 
chy as discussed in Section 2.2.4 and depicted in Figure 10. The use of the 
hierarchy makes clear the calling sequences of the procedures named. The 
data structures of documentation element 9 are specified in either SSL or in 
the C programming language. In documentation element 10, the procedures, 
themselves, are specified in SSL. 

Except for the programming team that writes the procedures, other teams 
will usually not be interested in the internal logic of the procedures. 
Consequently, they need only know the higher-level specifications of the pro- 
cedures. SSL as described in Section 2.2.2 and depicted in Figure 9 is an 
ideal specification language for revealing the design of the procedures from 
a top-to-bottom-and-layer-to-layer way. It also works well with the hierar- 
chical organization of procedures. 
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APPENDIX B 

THE SSL SPECIFICATION FOR TEST FILE GENERATION 



The program specification for test file generation is shown in this ap- 
pendix. The specification design is composed of two parts. In part two, all 
procedures and data structures that are required to define set members and 
then to select a particular member for a value in a record are specified. In 
part one, all the procedures that are not concerned with data sets are speci- 
fied. 



B.l Part I - Generating Random Test Data Strings 



/* 

/* 

/* 

* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

* 



’1) Part I - Generating Random Test Data Strings 
,2) Design: GENERATE FILE( FILE- NAME) 

,3) Designer: D.S. Kerr 
>4) Date: July 23, 1981 
.5; Modified July 30. 1981 
August 4, 1981 

August 11 , 1981 - removed SETS to Part II, no other changes 
August 25. 1981 - changed identification of set from 
SetNumber to SetPointer in AttributeDescription 
January 3, 1982 - description changed, no changes to the 
design itself 



(6) Purpose: 

The purpose of this system is to generate a file of test data which 
/* can be applied to MDBS. The user specifies the FILE-NAME, 

/* NUMBER-OF T RECORDS, and the NUMBER- OF -ATTRIBUTES-PER- RECORD. The user 
/* then specifies how the values of each attribute are to be chosen: 

/* randomly from a predefined set, randomly from a range of integers, or 
/* as a random character string. 

/* A set is characterized by a set-name, type( length) . number-of-members , 
/* and the members. It will be stored in a file called 
/* library of sets. 

/* 

/* 



set-name in a 



/* 

/* 

/* 



A distribution function, UNIFORM(min,max) , must be provided. It 
/* should generate a random integer between min and max. 

/* 

(7) Output: 

Output is a file of records where each record has the form 
fieldl$field2$ . . . $fieldn# 

/* The actual data output is a character string. $ is a special character 
/* to seperate fields, # is a special character to seperate records. A 
/* more formal definition is given below. 

/* 

/* Notation {item} 

/* [item] 

/* . 

/* file ::= label data 

/* label ::= numb er-of- records $ number-of-attributes $ attribute-body 
/* number-of-records ::= integer the number of records in the file 

/* number-of-attributes ::= integer the number per file 



means 1 or more occurrences of item 
means 0 or more occurrences of item 



*/ 

*/ 

*/ 

*/ 

*/ 

*1 

*/ 

* 

*/ 

*/ 

*/ 

*1 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*!, 

*/ 

*/ 

*/ 

*/ 

*/ 

*1 

*/ 

*/ 

*/ 

*/ 
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/* attribute-body {attribute-description $ } ... 

/* attribute-description ::=■ type(lengtn) source-domain $ distribution 
/* type INTEGER I STRING I FLOAT T others to be added later 
/* source-domain ::=* set-name I RANDOM- INTEGER | RANDOM-STRING 
/* distribution distribution-function(parameter-list) 

/* distribution-function ::=* UNIFORM I others to be added later 
/* parameter-list ::=* integer [, integer] ... 

I* length ::=* integer in bytes of an attribute 



/* integer 
/* set-name 
/* filename 
/* 

/* data : := 

/* record : 

/* field : : 

/* string : 

/* character ::=* dig 
/* digit 0 I 112 
/* letter lowercase 
/* lowercaseletter 
/* uppercaseletter 
/* specialcharacter 

/* 

/* 



{digit} ... usual definitipn 

3 filename sets are described below 

■ any legal RSX ( UNIX) filename 



{record#} ... 

- field l$f ield] .. 
string 

“ string character 
'* ‘ letter 



;it 



3 I 4 
etter 



c I d 

c Id 

e 



actual field value 
character 
I specialcharacter 
5 16 17 18 19 
uppercaseletter 



t] 

F 



•! 5 



ill 

% 

> 



1 £ 



m a 
M|N 
& 

[ 



Sli 



w x 
WlX 

+ I 

< 



1*1 



> I 



*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 



(8) Procedure Hierarchy for File Generation 

Generate 

File 

(TFG1) 



Define 

File 

(TFG11) 



Describe 

Attribute 

(TFG111) 



Load 

Sets 



(TFG12) 



(See Procedure Structure 



for Sets 



Write 

Attribute 

Description 

(TFG1311) 



Generate 

Data 

(TFG13) 



Write 

Data 

(TFG132) 



Generate Write 
Record Record 
(TFG1321) (TFG1322) 



Write 

Label 

(TFG131) 



Get Attribute 
Value 

(TFG1321 1 ) 



Stuff 

Attribute 

Value 

(TFG13212) 



SETS$ Random- 

RANDOM- Integer- 
VALUE Value 

(SET6) (TFG132111) 



Random- Random- 
String- Float- 

alue Value 

TFG132112) (TFG132113) 



K> 



(9) Data Structures 
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^define FILENAMELENGTH 10 /* maximum number of characters in a file name */ 

#define MAXATTRIBUTES 10 /* maximum number of attributes in a record */ 

#define MAXPARAMS 10 /* maximum number of parameters for distribution */ 

/* functions */ 

struct AttributeDescription { 

char Type; /* INTEGER = 'i' , STRING = 's' , FLOAT = 'f' */ 
int Length; /* number of bytes */ 

int Source;/* SET « 1 , RANDOM INTEGER = 2 , RANDOM STRING = 3 */ 
char SetName[ FILENAMELENGTH ]; /* defined only for sets */ 

** struct SetDef *SetPointer; /* to the set definition */ 

int Distribution: /* UNIFORM = 1 , . . . */ 
int Parameters [MAXPARAMS ] ; /* of the distribution */ 



struct FileDescription { 

char FileNamet FILENAMELENGTH ]; /* name of the file */ 
int NumberOf Records ; /* to be generated in the file */ 
int NumberOf Attributes ; /* in a record */ 
struct AttributeDescription De script ion [MAXATTRIBUTES ] ; 



(10) Program Specifications 



1. jot GEN ERATE_F I L E ; 

/* Uses: DEFINE FILE - to fill in the attribute descriptions */ 

/* LOAD STTTS - to load the sets */ 

/* GENERATE DATA - to generate the actual data */ 

perform DEFINE FILET 

• per form LOAD SETS ; 

• perform GENEHATE_DATA ; 

• enc i ob 



2.1. proc DEFINEJFILE; /* fill in file description */ 

2.2 read and store FileName, NumberOf Records , NumberOf Attributes ; 

2.3 Number Of At tributes <= MAXATTRIBUTES; 

/* Uses DESCRIBE_ATTRIBUTE( pointer_to_description ) - to read */ 
and store an attribute description */ 

2.4 int i; /* attribute subscript */ 

2.5 for i from 1 to NumberOf Attributes do 

2.6 perform DE3’CRIBE_ATTRIBUTE( &FiTeDescription.Description[ i] ) ; 

2.7 end for 

2.8 end proc 



4.1 proc GENERATE DATA: 

/* Uses tEe information in the file and attribute descriptions to */ 
/* generate the file. */ 

4.2 open file(file name); 

4.3 perform WRITE LABEL; 

4.4 perform WRITEHDATA; 

4.5 close f ile(f iTe_name) ; 

4.6 end proc 



4.3.1 proc WRITE LABEL; 

/**"Uses the information in the file and attribute */ 
/* descriptions to write the label. */ 



4.3.2 int i; /* attribute subscript */ 

4.3.3 $char is the special character used to seperate fields 

4.3.4 write NumberOf Records , $char, NumberOf Attributes , $char; 

4.3.5 for i from 1 to NumberOf At tributes do 

4.3.6 perform WRITE ATTRIBUTEJD ESCRIPTTONC 

4.3.7 &FileDescription.Description[ i ] ); 

4.3.8 write $char; 

4.3.9 end for 

4.3.10 end proc 
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4.4.1 proc WRITE DATA; 

/★"Uses the information in the file and attribute descriptions 

*/ 

/* to generate and write the records. */ 

Uses: GENERATE RECORD( record ) returns record as a 

character string 

WRITE_RECORD( record ) where record is the . character 
string to be added to the output file. 

4.4.2 int rec_no; /* index for records */ 

4.4.3 #char is the special character used to seperate records 



4.4.3 

4.4.4 

4.4.5 

4.4.6 

4.4.7 

4.4.8 

4.4.9 



for rec no from 1 to number of records 

<0T “ ~ “ “ 

perf orm GENERATE RECORD( record ); 
perform WRITE RECORD ( filename, record ); 
Write #char ; 

end for ; 
end proc 



4.5.1 proc GENERATE_RECORD( record); 

output: record - character string 

Uses: GET_ATTRIBUTE_VALUE( pointer_to description ) - 

to determine a value (character string ) for 
this attribute 

STUFF_ATTRIBUTE_VALUE( value, record ) - appends 
value to record. 

4.5.2 int attr_no; /* index for attributes */ 

4.5.3 string value; /* for a particular attribute, in string form */ 

4.5.4 for attr no from 1 to number of attributes do 

4.5.5 value :="UET ATTElBUTE_V£LUh( 

FiTeDescriptTon.Description[ attr no ] ) ; 

4.5.6 perform STTJFF_ATTRIBUTE_VALUE( value , record )“ 

4.5.7 end for ; 

4.5.8 return (.record) ; 

4.5.9 end proc 



/* end scope of FileDescription */ 



2.6.1 proc DESCRIBE ATTRIBUTE ( DescriptionPtr ); 

input: UescriptionPtr - /* points to AttributeDescription */ 

2.6.2 struct AttributeDescription { 

2.6.3 char Type; /* INTEGER = 'i' , STRING = 's' , FLOAT = 'f' */ 

2.6.4 int Length; /* number of bytes in string to be generated */ 

2.6.5 int Source;/* SET = 1 , RANDOM_JNTEGER = 2 , RANDOM_STRING = 3 

*/ 

2.6.6 char SetNamef FIL ENAMEL ENG TH ]; /* defined only for sets */ 

2.6.7** struct SetDef *SetPointer; /* to the set definition */ 

2.6.8 int Distribution: /* UNIFORM =1 , . . . */ 

2.6.9 int Parameters [MAXPARAMS] ; /* of the distribution */ 

2 . 6.10 } 

/* enter attribute into attribute table */ 

2.6.11 define Type; 

2.6.12 define Length; 

2.6.13 define Source; 

2.6.14 if_ Source is SET 

2.6.15 then 

2.6.16 define SetName; 

2.6.17 define Distribution; 

2.6.18 define Parameters; 

2.6.19 end proc 
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4. 3. 6.1 proc WRITE ATTRIBUTE DESCRIPTION DescriptionPtr ); 

/* Writes the description of one attribute into the output file */ 

4. 3. 6. 2 $char is the special character used to seperate fields 

4. 3. 6. 3 Write the description; 

4. 3. 6. 4 end proc 



4. 5 .5.1 proc GET ATTRIBUTE VALUE( DescriptonPtr ); 

input: DescriptionPtr /* points to AttributeDescription */ 

returns: value character string 



Uses: SETS$RANDOM VALUE( DescriptionPtr ) 

RANDOM INTEGER VALUE( DescriptionPtr ) 

RANDOM - STRING VALUE ( DescriptionPtr ) 
RAND0OL0AT_VALUE( DescriptionPtr ) 

each returns a character string representation 
of the appropriate value 



4.5.5 

4.5.5 

4.5.5 

4.5.5 

4.5.5 

4.5.5 

4.5.5 

4.5.5 

4.5.5 



2 struct AttributeDescription { 



FLOAT « 'f' */ 



•3 char Type; /* INTEGER = 'i' , STRING = 

.4 int Length; /* number of bytes */ 

•5 int Source; /* SET * 1 , RANDOM INTEGER = 2 , RANDOM STRING 

•6 char SetNameL FIL ENAMEL ENGTH I; /* defined only for sets */ 

.7** struct SetDef *SetPointer; /* to the set definition 

•8 int Distribution; /* UNIFORM =1 , • . • */ 

•8 int Parameters [MAXP ARAMS ] ; /* of the distribution */ 

• 10 > 

/* get a value for attribute attr_no */ 



3 */ 
*/ 



4,5. 5-11 case DescriptionPtr • Source value 

4,5. 5,12 ^set' : value 

SETS$RANDOM_VALUE( DescriptionPtr ); 



4,5,5,13 'integer' : value := 

RANDOM_INTEGER_VALUE( DescriptionPtr ); 



4,5. 5. 14 'string' : value := 

RAND 0M_S TRI NG_VALU E ( DescriptionPtr ); 



4.5.5.15 'float : value := 

RAND0M_FL0AT_VALUE( DescriptionPtr ); 

4.5.5.16 end case 

4.5.5.17 re turn (value) ; 

4.5.5.18 end proc 



4. 5. 6.1 proc STUFF ATTRIBUTE VALUE(value , record) ; 

input: value - character string 

input/output: record - character string 

4. 5. 6. 2 /* puts value into record */ 

4. 5. 6. 3 end proc 



4. 4. 6.1 proc WRITE RECORD ( file_name, record ); 

input: file — name, record 

/* actually writes record to file file_name */ 

4. 4. 6. 2 end proc 

4.5.5.13.1 proc RANDOM INTEGER VALUE( DescriptionPtr ); 

input: DescriptionPtr 

/* returns an integer value as a character string */ 

4.5.5.13.2 end proc 

4.5.5.14.1 proc RANDOM STRING_VALUE( DescriptionPtr ); 

input : DescriptionPtr 

/* returns a character string value */ 

4.5.5.14.2 end proc 

4.5.5.15.1 proc RANDOM FLOAT_VALUE( DescriptionPtr ); 

input: DescriptionPtr 

/* returns a floating point number as a character string */ 

4.5.5.15.2 end proc 
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B.2 Part II ^ Generating Realistic Test Data Sets 



/* 

/* 

/* 

/* 

/* 

* 

/* 

/* 

/* 

/* 

/* 



|1) Part II - Generating Realistic Test Data Sets 
,2) Design: SETS in File Generation 
>3) Designer: D. S. Kerr 
,4) Date: July 23, 1981 
.5) Modified: July 30. 1981 
August 4, 1981 
August 24, 1981 

August 27, 1981 Minor changes after final walkthrough 



(6) Purpose: 

A set is characterized 



by a set_name, type, length, NumberOfMembers, 



/* and the members. It will be stored m a file called set_name in a 
/* library of sets. 

/* 

/* (7) Input and Output Data 

/* The form of the data in a file is shown below. $ is a special character */ 



/* used to seperate fields in the file 
/* used to seperate members. 

/* 

/* set_f ile_name any legal RSX ( 

/* set_file ::■ set_label set_data 
/* set_label ::= DataType $ DataLength 
/* set data ::= { Member # } ... 

/* MemFer ::= { character } ... 



label. # is a special character 

UNIX) filename 
$ NumberOfMembers $ 



*1 

*/ 

*/ 

*/ 

*/ 

*1 

*1 

*/ 

*/ 

*/ 



*1 

*/ 

*1 

*/. 

*/ 

*/ 



/* DataType ::=* INTEGER I STRING 
/* DataLength ::= integer 
/* NumberOfMembers : : = integer 
/* integer ::= { digit } ... 

/* 

/* character ::= digit I letter 
/* digit : := 0 | 1 T 2 I 3 I 4 
/* letter ::= lowercasel$tter 



FLOAT I others to be added later 
in bytes, of a set member 
the number pf members in the set 
usual definition 



/* lowercaseletter 
/* uppercaseletter 
/* specialcharacter 
/* 

/* 

/* 



= a b 
= A I B 
:= ! 



I specialcharacter 
I 5 T 6 I 7 | 8 I 9 
uppercase lpt ter 





*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 
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(8) Procedure Structure for Sets 



Load 

Sets 

(TFG12) 



Get 

Attribute 

Value 

(TFG13211) 



+■ 



+ 



SETSM$ 
START 
( SET1 ) 



SETSM$ SETSM$ 
STATUS LOADED 
CHECK “ ( SET3 ) 
( SET2) 



SETSM$ SETSM$ 

IN FILE DEFINE 
(SET4) LOAD AND 
SAVE" “ 
( SET5 ) 



-+ +-+- 



LOADED IN FILE 
AND ERROR ANTJ ERRUR 
(TFG121) (TFG122) 



+■ 



+ 



SETSM$ 
RANDOM 
VALUE “ 
( SET6 ) 



NEXT READ MEMBER 
SET ~ FROMTILE ' 
( SET21 ) (SET41) 



READ MEMBER SAVE MEMBER 
FROM TERMINAT IN FILE ‘ 
TSET51) CSET52) 



SET MEMBERSM$ 
START 
(SMI) 



SET MEMBERS M$ 
STORE 
MEMBER 
(SM2) 



I 
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(10) Program Specifications 



proc L0AD_SETS; 

/* For each set to be loaded in main memory, fill in SetDef inition. */ 

/* Also fill in SetPointer in Attribut ^Description. Set may already */ 

/* be loaded for a previous attribute. It may also be in a previously */ 
/* defined library of sets. If it is not already defined, then it must */ 
/* be read from a terminal. b */ 

/* Any new sets defined will be added to this library. */ 

Uses : 

SETSM$START - to initialize SETSM module 

SETSM? STATUS CHECK( NamePtr, Type, Length, Status, SetPtr ) 

- Returns the status of set NamePtr . Also returns SetPtr, a 
pointer to a structure of type SetDef if there were no errors. 
Possible values of Status are: 

LOADED - Already loaded in primary memory for a previous attribute 

LOADED AND_ERR0K - Loaded but set description ana attribute 
description do not match 

IN_FILE - Already defined in a file but not yet loaded in primary 
memory 

IN_FILE_AND_ERROR - Defined but set description and attribute 
description do not match 

NEW - Set not yet defined 

SETSM$LOADED( NamePtr, SetPtr ) - Returns SetPtr for set 'NamePtr' 
which is already loaded in primary memory. 

SETSM$IN_FILE( NamePtr, SetPtr ) - Loads set 'NamePtr' from file 
into primary memory. Returns the corresponding SetPtr. 

SETSM$DEFINE LOAD_AND_SAVEt NamePtr, Type, Length, SetPtr ) - When 
set TTas not previously been defined. Reads the set 
from the terminal, loads it into primary memory, and 
saves it in a file for future use. SetPtr is returned. 



LOADED_AND_ERROR - Used to fix error. 

IN_FILE_AND_ERROR - Used to fix error. 

int Status; /* of a set takes on values shown above */ 
int i: /* attribute subscript */ 

char *CurrentAttributePtr : /* pointer to the current attribute if a set.*/ 
set description *SetPtr; /* pointer to the set description */ 

perform SETSM$START; 

for 1 from 1 _to NumberOf At tributes do 

if Til eDe script ion. Description [TT. Source is SET 
then /* tnen there are 3 cases */ 

/* 1 . set already loaded in memory for a previous attribute*/ 
/* 2. set already defined in a file */ 

/* 3. set must be read from terminal, loaded and saved */ 
CurrentAttributePtr = &FileDescription.Description[ i] ; 
perform SETSM$STATUS CHECK( CurrentAttributePtr->SetName , 
CurrentAttriFutePtr->Type , 

CurrentAttributePtr- >Length, 

Status. SetPtr ); 
case Status value 

LOADED: perform SETSM$LOADED( SetPtr ); 

IN FILE: perform SETSM$IN FILE( SetPtr ); 

NEW: pertorm"‘SETSM$DEFINE^LOAD AND SAVE( SetPtr ); 
LOADED ANT) ERROR: perf omTLOADTD AND ERROR; 

I N_F I LT_ANU_ERROR : perform I N_F irE_AND_ERROR ; 
end case ; 

CurrentAttributePtr->SetPo inter - SetPtr ; 
end if ; 
end for ; 
end proc 



proc LOADED_AND ERROR* 

/* May ask Tor redefinition of set name or attribute description. */ 
end proc 



proc IN FILE AND_ERROR ; 

/* Nay ask for redefinition of set name or attribute description. */ 
end proc 



4>U>lo 
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module SETSM; 

/* Values of defined constants must be determined before system */ 
/* is fully operational. */ 

#define FILENAMELENGTH 10 /* maximum number of characters in a file name */ 
#de fine MAXSETS 10 /* maximum number of sets allowed */ 

#define MAXMEMBERS 10 /* maximum number of members in a set */ 

#define MAXSETSTORAGE 10 /* maximum storage to hold the sets */ 

struct SetDef { 

char Name[ FILENAMELENGTH]; 

char DataType; /* INTEGER = 'i' , STRING = 's' , FLOAT = 'f' */ 
int DataLength; /* number of bytes */ 

int NumberOfMembers ; /* in the set. When set is being stored */ 
/* is the number of members currently in the set. */ 
char *MemberPtr [MAXMEMBERS] ; /* MemberPtr[i] points to the */ 

/* character string value of the i-th member of this set. */ 

struct SetDef SetDef inition [MAXSETS ] ; /* one for each set */ 

struct SetDef *SetAvailablePtr ; /* points to next available set */ 

exported: START, STATUS_CHECK, LOADED, IN FILE, DEFINE LOAD AND SAVE, 

RAND 0M_VALU E ~ ~ 

internal: READ MEMBER FROM FILE. READ MEMBER FROM TERMINAL, 

“ SAVE_MEMBEE_IN_FILE “ “ “ 

jses START and STORE MEMBER from SET MEMBERS M module 



1.1 sroc START: 

/* initializes SetAvailablePtr and SET MEMBERSM module */ 
Initialize SetAvailablePtr to initial HetDef inition. 
perform SET_MEMBERSM$START ; 
end proc 
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.2 

.3 
.4 
.5 
.6 
• 7 
.8 
.9 

.10 

.11 

.12 

.13 

.14 

.15 

.16 

.17 

.18 

.19 

.20 

.21 

.22 

.23 

.24 

.25 

.26 

.27 



.28 

.29 

.30 



.18.1 



.18.2 

18.3 

.18.4 



.1 



proc STATUS_CHECK( input : SetNamePtr, Type, Length, 
output : Status, SetPtr ;; 
input: SetNamePtr /* or set to be checked */ 

Type /* from attribute description */ 

Length /* from attribute description */ 

output; Status /* of set 'NamePtr'. Possible values are: 

LOADED - Already loaded in primary memory 

LOADED AND ERROk - loaded but descriptions do not match 

IN FILTT - Xlready defined in a file 

INT’ILE^AND^ERROK - Defined but descriptions don't match 
NEW* - not yet defined */ 

SetPtr /* Pointer to structure of type SetDef. This set 
/* is to be defined if there are no errors. */ 

/* SetName pointed to by SetNamePtr is the same as the name of the file */ 

/* which holds the set. */ 

if there exists a set i , such that SetDef inition [ j j .Name matches 
set name identified by SetNamePtr 
then /* Check set description and attribute description */ 
if descriptions match 
then 

Status = LOADED; 

define SetPtr to point to SetDef inition [\ ] ; 
else Status = LOADED_AND_ERROR ; 
end if ; 

else /* There are still two possibilities. First, check for set */ 

/* in library of sets */ 
open file( SetNamePtr ); 
if open successful 

then /* check set description and attribute description */ 
read set description from file; 
if descriptions match 
then 

Status = IN FILE: 
perform NEXT_SET( SetPtr ); 

store set name, type, length & NumberOf Member s 
in set description: 
else Status = IN_FILE_AND_ERR0R; 
end if ; 

else /* Since open was not successful, set must not have */ 

/* been previously defined */ 

Status = NEW; 

perform NEXT_SET( SetPtr ); 

store set name, type and length in set description* 

/* Note that NumberOf Members is not known until the */ 
/* set has been read from the terminal. */ 

end if ; 
end if ; 
end proc ; 



internal procedure which requires access to all of SetDef inition 



proc NEXT_SET( output : SetPtr ); 

/* Returns a pointer to the next available set. Increments */ 

/* SetAvailablePtr . */ 

/* SetPtr and SetAvailablePtr - pointers to structures of type SetDef. */ 

SetPtr = SetAvailablePtr; 

Increment SetAvailablePtr ; 

end proc 



end_scope_jDf SetDef inition 



proc LOADED ( input : SetPtr ); 

/* actually does not have to do anything */ 

input: SetPtr /* a pointer to a structure of type SetDef */ 

end proc 
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4.1 proc IN_FILE( input : SetPtr ); . 

input: betVtr /* a pointer to a structure of type SetDef, the */ 

/* set to be input */ 

/* Reads in the actual set members from the file and */ 

/* stores them in set SetPtr. */ 

4.2 struct SetDef { 

4.3 char Name[ FILENAMELENGTH]; 

4.4 char DataType; /* INTEGER = 'i' , STRING = 's' , FLOAT = 'f' */ 

4.5 int DataLength; /* number of bytes */ 

4.6 int NumberOfMembers ; /* in the set. When set is being stored */ 

/* is the number of members currently in the set. */ 

4.7 char *MemberPtr [MAXMEMBERS ] ; /* MemberPtr[i] points to the */ 

/* character string value of the i-th member of this set. */ 

4.8 } 

4.9 int i; /* index for set members */ 

4.10 /* read and store the members */ 

4.11 for i from 1 to NumberOf Members do 

4.12 perform READ MEMBER FROM FILET SetPtr- >Name , MemberValue ); 

4.13 perform SET_HEMBERSlf$STOEE MEMBERC input : MemberValue, 

output : NewMemberFtr ); 

4.14 MemberPtiT l ] = NewMemberPtr; 

4.15 end for ; 

4.16 close file( SetPtr- >Name ); 

4.17 end proc 



5.1 



5.2 

5.3 

5.4 

5.5 

5.6 

5.7 

5.8 

5.9 

5.10 



5.11 

5.12 

5.13 

5.14 

5.15 

5.16 

5.17 

5.18 

5.19 

5.20 

5.21 

5.22 

5.23 

5.24 

5.25 

5.26 

5.27 

5.28 

5.29 

5.30 

5.31 



proc DEFINE LOAD AND SAVE( input : SetPtr ); 

input: SetPtr /"** pointer to structure of type SetDef */ 

/* Set has not previously been defined. Reads the set from the *7 

/* terminal, loads it into primary memory, and saves it in a file for */ 
/* future use. Stores Number Of Members in set description. */ 

struct SetDef { 

char Name [ FILENAMELENGTH]; 

char DataType; /* INTEGER = 'i' , STRING = 's' , FLOAT = 'f' */ 
int DataLength; /* number of bytes */ 

int NumberOfMembers ; /* in the set. When set is being stored */ 
/* is the number of members currently in the set. */ 
char *MemberPtr [MAXMEMBERS ] ; /* MemberPtrfi] points to the */ 

/* character string value of the i-th member of this set. */ 



int i; /* index */ 

SetNamePtr /* pointer to the name of the set */ 

/* Define and load set into primary memory. */ 
i = 0; 

perform READ MEMBER_JROM_TERMINAL( MemberValue ); 
while { MoreKembers ) do 
if i > MAXMEMBERS 

then perform ErrorRoutine ; 

perform SETT1 eMBERSM$ STORE MEMBERC input : MemberValue, 
output : NewMemberFtr , ErrorStatus ) ; 
if ErrorStatus = N0_SPACE 

then perform ErrorRoutine; 
increment ij 

MemberPtr[ l ] = NewMemberPtr; 

perform READ_MEMBER__FROM_TERMINAL( MemberValue ); 
end while ; 

store l in NumberOfMembers in SetDef; 

/* Save set in file. */ 

SetNamePtr = SetPtr- >Name ; 
open file( SetNamePtr ); 
write set description to file: 

/* write set members to file */ 

for i from 1 to NumberOfMembers do 

perform SAVE MEMBER IN FILE(“EetNamePtr , 
EetPtr-^MemberPtr [ i ] ); 

end for ; 

close file( SetNamePtr ); 
end proc 



5.32 
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1 



2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 
17 



18 



proc RANDOM_VALUE( input : AttributeDescript ionPtr ); 

/ ^returns pointer to value */ 

input: AttnbuteDescriptionPtr /* pointer to structure of type */ 

/* AttributeDescription */ 

struct AttributeDescription { 

char Type; /* INTEGER = i , STRING - 's' , FLOAT = 'f' */ 
int Length; /* number of bytes */ 

int Source;/* SET = 1 , RANDOMINTEGER - 2 . RANDOMSTRING * 3 */ 
char SetName[ FIL ENAMEL ENGTH ]; /* defined only for sets */ 
struct SetDef *SetPointer; /* to the set definition */ 
int Distribution: /* UNIFORM =1 , . . . */ 
int Parameters [MAXP ARAMS ] ; /* of the distribution */ 



struct SetDef { 

char Name [ FIL ENAMEL ENGTH] ; 

char DataType; /* INTEGER » 'i' , STRING * 's' , FLOAT = 'f' */ 
int DataLength; /* number of bytes */ 

int NumberOf Members ; /* in the set. When set is being stored 

/* is the number of members currently m the set. 
char *MemberPtr [MAXMEMBERS ] ; /* MemberPtr [ i ] points to the *, 

/* character string value of the i-th member of this set. */ 



/* Uses SetPointer, Distribution, Parameters */ 

/* to find the value of a random member of the set */ 

/* Note that different attributes may use the same set, but */ 

/* with different distribution functions. */ 

end proc 



end_o f _exp o r t e d_ p r o c e du r e s 



start_of_internal_procedures 



12.1 proc READJ4EMB ER_FR0M_F ILE ( input : SetNamePtr, 

output : MemberValue ) ; 

/* Reads a member from file SetNamePtr */ 

12.2 end proc ; 



12.1 proc READ MEMBER FROM TERMINAL ( output : MemberValue ); 

/* Reads a member from terminal */ 

/* Uses NULL to indicate no more members. */ 

12.2 end proc ; 



29.1 proc SAVE MEMBER IN FILE( input : SetNamePtr, MemberPtr ); 

input: SetNamePtr 
MemberPtr 

/* writes member into file SetNamePtr */ 

29.2 end proc ; 



end_of_internal_procedures 
end module SETSM 



# Ar 
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module SETJMEMBERSM 

c¥ar SetData[MAXSETSTORAGE] ; /* holds the set members - pointed to */ 

/* by MemberPtr */ 

char *MemberAvailablePtr ; /* Points to 1st available space in */ 

/* SetData */ 

/* first character of member m of set s is SetDef inition[s] .MemberPtr [m] */ 
/* last character is SetDef inition[s] .MemberPtr [m+DataLength-1 ] */ 

/* SetDef inition[s ] .MemberPtr [m+DataLength] is NULL ( = '\0') */ 

1 .1 proc START; 

/* Initialize MemberAvailablePtr to beginning of SetData. */ 

1 .2 end proc 



2.1 



2.2 

2.3 

2.4 

2.5 

2.6 

2.7 

2.8 
2.9 



proc STORE_MEMBER( input : Member Value , 

output : NewMemberPtr , ErrorStatus ); 

/* MemberValue is the value to be stored. It is stored in the next */ 
/* available spaces in SetData. NewMemberPtr is returned after it */ 
/* is set to point to this value. MemberAvailablePtr is incremented.*/ 
/* ErrorStatus is set to OK. If there is no room, then ErrorStatus */ 
/* is set to N0_SPACE. */ 



/* It should be noted that only STORE_MEMBER requires access to all */ 
/* of SetData. All other routines get pointers to a particular */ 
/* value in SetValue and use only that particular value. */ 



/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 



SetData looks like 



I value NULL value NULL . . . last-value NULL unused I 



where MemberAvailablePtr points to the first available space. 



*/ 

*/ 

*/ 

*/ 



if there is not enough space 

then ErrorStatus = N0__SPACE; 
el slT 

TlrrorStatus = OK; 

Store MemberValue in SetData from MemberAvailablePtr on; 
NewMemberPtr = MemberAvailablePtr; 
increment MemberAvailablePtr ; 

end if 



2 .10 end proc 



end module SET MEMBER SM 
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APPENDIX C 

THE SSL SPECIFICATION FOR DATABASE LOAD 



The program specification for database load is shown in this appendix. 
The specification design is composed of two parts. Part one includes all 
procedures for the database load subsystem. Part two includes the specifica- 
tions for the Record Template module. 



C.l Part I_ ^ Database Load Subsystem 



/* 

/* 

/* 

/* 

* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 



;i; 

2 

3 

4) 

.5] 



( 6 ) 



Part I - Database Load Subsystem 
Design: DATABASE LOAD UTILITY 

Designer: P. R. Strawser 

Date: August 25, 1981 

Modified: September 16, 1981 

Changes marked with '**' in SSL. 



*/ 

*/ 

*/ 

* / 
*1 

*( 

*/. 



(7) 



Purpose : 

The database load utility gives MDBS users 
the capability to load pre-existing data from */ 
other database systems or other files into */ 
the MDBS system. This utility is designed to */ 
run on the MDBS controller. */ 

The files being loaded are assumed to be of */ 
fixed length records, all records in a file */ 
having the same format. The files are also */ 
assumed to be resident at the controller. */ 

The database load utility, using these files */ 
and other information supplied oy the user, */ 
constructs the DDIT and AT tables required by */ 
the directory management subsystem. It also */ 
formats the input records as required for */ 

storage in the database, organizes the records*/ 
into clusters, and distributes the records and*/ 
the directory management data to the backends.*/ 
*1 

Output: */ 

Output is DDIT and AT information for the */ 

directory management subsystem, and records */ 
formatted for storage at the backends. */ 
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(8) Procedure Hierarchy for Database Load 



//FILEPREP 

(DBL11) 



// DESCRDEF 
(DBL111) 



//DBLOAD (DBL1) 



//TYPEADEF 
(DBL1111) 

//TYPEBDEF 
(DBL1112) 

//TYPECLST (DBL1113)' 

// REVDESCR (DBL1114) 

/ //attrchar/X / 

//SRCHCLST /,, (DBU121) X 

(DBL1122) //SRCHCLST (DBL1122) X 

//REVRTEMP (DBL1123; 





// RTEMPDEF 
(DBL112) 



/DRVAORB (DBL1131) 



/ DRVKWORD 
(DBL113) 



/ DRVC (DBL1132) 



Procedures on the left 
of a solid line are the 
subprocedures of the 
procedure on the right 
of the solid line. 




// LOAD! 
(DBL1 



/ PUTINLST (DBL1133) 



BLDSRT (DBL1134) 



// PROCLUST 
CDBL141) 



REVTYPEC (DBL1135) 

/GETRAND (DBL1411)' 




/ Coding is completed; walkthrough 
is completed; test is to start. 

// Testing is completed also. 



/DISTRREC (1412)' 



Procedures on the left of a dotted 

line are also the subprocedures of 
the procedure on the right of the 
dotted line. 



/newclust 

(DBL14121/ 
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(9) Data Structures 

Data structure definitions are included at the beginning of each procedure 
definition in (10) below. 

(10) Program Specifications 

First Level Specifications for Database Load 

1. subsystem DATABASEJLOAD; /* DBLOAD (DBLl ) */ 

/* Prepare data for initial load into the database. */ 

/* Define descriptors and prepare records for */ 

/* loading into the database. */ 

if clusters to be formed at file level 
Then 

perform FILE PREP 

else 

perform DAT ABAS E_P REP 
end if; 

/* Sort data into clusters. */ 

8 . perform SORT_INTO_CLUSTERS ; 

/* Load data into database store. */ 

9. perform L0AD_DATA; 

10 . end subsystem; 
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Second Level Specifications for Database Load 



4.1 proc FILE_PREP; /* FILEPREP (DBL11 ) */ 

/* Prepare files for loading to the database store. Clusters will be */ 
/* defined at file level. */ 

4.2 array type-C_ attr__names ; /* Attribute names over which type-C */ 

/* descriptors will be defined. */ 

4.3 arglist = (clustering level, /* "FILE" . */ 

record^ type, /* Type records in the current file, */ 

/* (Payrpll, Employee , Inventory ...)*/ 
database name, /* Generic name for this database. */ 

~~ /* e.g. PERSONNEL, PARTS, CONTRACTS. */ 

pointer to type-C__attr_names) ; 



4.4 


scalar atpointer 


/* 

/* 


Pointer 

created 


to instance of AT 
for this task. 


*/ 




rectemppointer ; 


/* 

/* 


Pointer 

file. 


to RTEMP for current 


*/ 

*/ 



4.5 arglist .clustering^level := "FILE”; 

4.6 get arglist . database_name from terminal; 

4.7 while more files to be loaded do 

4.8 begin 

4.9 get arglist . record_type from terminal; 

/* Define all descriptors for this file. */ 

/* Argument list constructed as above is passed to */ 

/* the DEFINE_J)ESCRIPTORS procedure, which returns */ 

/* a pointer to the instance of AT created for this*/ 

/* task. */ 

4.10 perform DEFINE_DESCRIPTORS( arglist , 

atpointer) ; 

4.11 arglist + atpointer; 

/* Define recprd structure via a record template. */ 

/* Argument list constructed as above is passed to */ 

/* this procedure, which returns a pointer to the */ 

/* record template created for this file. */ 

4.12 perform DEFINE_RECTEMP( arglist , 

rectemppointer ) ; 

4.13 arglist + rectemppointer; 

/* Examine each record in the file, and determine */ 
/* the set of descriptor ids representing descriptors */ 
/* from which keywords in that record can be derived. */ 
/* Create recprds to be sorted into clusters. */ 

/* Argument list constructed as above is passed to */ 
/* this procedure. */ 

4.14 perform DERIVE_J)IRECTORY_KEYWORDS( arglist) ; 

4.15 end while 



4.16 end proc; 
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6.1 proc DATABASE_J?REP ; /* DBPREP (DBL12) */ 

/* Prepare data for loading to the database store. Clusters will be 
defined at database level. */ 

6.2 array type- C_attr_ names ; /* Attribute names over which type-C */ 

/* descriptors will be defined. */ 

6.3 arglist = (c lus tering_ level , /* "DATABASE" */ 

iecord_type, /* Type records in the current file. */ 

/* (Payroll, employee, inventory ...)*/ 
database name, /* Generic name for tnis database, */ 
/* e.g. PERSONNEL, PARTS, CONTRACTS. */ 
pointer to type-C_ attr_names) ; 



6.4 


scalar atpointer, 


/* 

/* 


Pointer 

created 


to instance of AT 
for this task. 


*/ 

*/ 




rectemppointer ; 


/* 


Pointer 

file. 


to RTEMP for current 


*/ 

*/ 



6.5 arglist .clurtering level : = "DATABASE"; 

6.6 get arglist . database_jiame from terminal; 

/* Define all descriptors for this database. */ 

/* Argument list constructed as above is passed to */ 

/* the DEF INE_J)ESCRIPTORS procedure, whicn returns */ 

/* a pointer to the instance of AT created for this*/ 

/* task. */ 

6.7 perform DETINE_J)ESCRIPTORS(arglist , 

atpointer ) ; 

6.8 arglist + afipointer; 

6.9 while more files in this database do_ 

6.10 begin 

6.11 get argl ist . record_type from terminal; 

/* Define record structure via a record template. */ 

/* Argument list constructed as above is passed to */ 

/* this procedure, which returns a pointer to the */ 

/* record template created for this file. */ 

6.12 perform DEFINE__RECTEMP( arglist , 

rectemppointer ) ; 

/* Examine each record in the file, and determine */ 
/* the set of descriptor ids representing descriptors */ 
/* from which keywords in that record can be derived. */ 
/* Create records to be sorted into clusters. */ 

/* Argument list constructed as above is passed to */ 
/* this procedure. */ 

6.13 arglist + rectemppointer; 

6.14 perform DERIVE__D I RECTORY KEYWORDS 

(arglistT; 

6.15 end while 



6.16 end proc; 
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8.1 proc SORT_INTO_CLUSTERS; /* SRTCLUST (DBL13) */ 

/* Records to be sorted have the form: # # */ 
/* record 38 (descr_count , Number of descriptors in descr_ids*/ 
/* list which follows. */ 
/* descr_ids, List of descriptor ids from which */ 
/* this record can be derived. */ 
/* database_record) ; Record formatted into form */ 
/* required for storing in the data- */ 
/* base store. */ 



8.2 open files; 

8.3 sort records in ascending sequence by 

descriptors_ids within descr__count ; 

8.4 close files; 

8.5 end proc; 



9.1 proc L0AD_DATA; /* LOADDATA (DBL14) */ 

/* Distribute clusters of data across the multiple backends */ 
/* according to the track-splitt ing-with-random-placement */ 
/* data placement strategy. */ 



9.2 scalar cdtpointer; /* From CDTM module CREATE. */ 

9.3 record = (descr_count , /* Number of descriptors in descr_ids*/ 

/* list which follows. */ 

descr_ids, /* List of descriptor ids from which */ 

/* this record can be derived. */ 

database_record) ; /* Record already formatted into form*/ 

/* required for storing in the data- */ 

/* base store. */ 

9.4 system_info - (number of backends, /* This data assumed to be */ 

** backenH_acTdresses , /^available in some system*/ 

t:rack_capacity ) ; /* generation file accessi-*/ 

/* Ble through some module */ 
/* called SYSDATA. */ 



/* Create an instance of CDT for this task. The CREATE function of*/ 
/* the CDT module returns a pointer to the instance of CDT created */ 
/* for this database. */ 

9.5 perform CDTM$CREATE(cdtpointer ) ; 

/* Get from the system the information required for this task. */ 

9.6 perform SYSDATA$INFO( system_inf o) ; 

/* Read the first record. */ 

9.7 open file of sorted records and read first record; 



^•8 while more clusters in sort file do 

9 .9 begin 

9 .10 perform PROCESS_A_CLUSTER( record , 

system_inf o , 
cdtpointer) ; 

9.11 end while 



9.12 end proc; 
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Third Level Specifications for Database Load 



4.10.1 proc DEFINE__DESCRIPTORS /* DESCRDEF (DBLlll) */ 
( input : inpointer(clustering_level , 
record_type , 
database name) 
type-C_aFtr_names ) , 
output : atpo inter ) ; 



/* Define descriptors for this file or database, and store them */ 
/* in the DDIT created for this task. */ 
/* Input is a pointer to an list which contains */ 
/* clustering_level ('.'FILE" or "DATABASE") , record type, */ 
/* database_name, a list of attribute names over which type-C*/ 
/* descriptors are to be defined. */ 
/* Output is a pointer to the instance of AT created for */ 
/* this task. */ 



4.10.2 scalar name, /* Name to identify AT */ 

atpointer; /* Pointer to instance of AT */ 

/* created for this task. */ 



4.10.3 

4.10.4 

4.10.5 

4.10.6 

4.10.7 

4.10.8 



4.10.9 

4.10.10 

4.10.11 

4.10.12 

4.10.13 



if inpointer . clustering_level = "database" 

Then 

name := inpointer . da tab as e__name ; /* NOTE: may also want */ 

else /* to indicate cluster- */ 

name := inpointer . racord_type ; /* ing level here. */ 

end if ; 

/* CREATE function of module ATM returns a pointer to */ 

/* the instance of AT created for this task. */ 

perform ATM$CREATE(name , atpointer); 

if unsuccessful create 
then 

display message 

else 



4.10.14 



4.10.15 

4.10.16 

4.10.17 



4.10.18 

4.10.19 

4.10.20 



4.10.21 



4.10.22 

4.10.23 



begin 

/* First all type-A descriptors. */ 
while more type-A descriptors do 

perform DEF_TYFE-A_DESCR( atpointer ) ; 
end while ; 

/* Then all type-B descriptors. */ 
while more type-B descriptors do 

perform DEF — TYPE- B__DESCR( atpointer ) ; 
end while ; 

/* Build an array of attribute names over which type-C */ 
/* descriptors are to be defined when input is read. */ 

perform LIS T_TYP E-C_ATTR_NAMES 

( inpointer . type- C_att renames , 
atpointer) ; 

/* Allow user to review descriptors for accuracy. */ 

perform REVIEW_DESCRIPTORS # 

( inpointer . type-C_attr_names , 
atpointer ) ; 



return(atpointer ) ; 



4.10.24 end if ; 



4.10.25 end proc ; 
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4*12.1 proc DEFINE_RECTEMP /* RTEMPDEF (DBL112) */ 

( input : inpointer(clus tering^level , 
record_type, 
database_name, 
type-C_ att r _nam es , 
atpo inter ) , 

output : rectemppointer) ; 



/* Define the structure of the records in this file by */ 
/* building a record template. */ 

/* Input is a pointer to an list which contains */ 
/* clusterin g level ("FILE** or "DATABASE”), record type, *7 
/* database_name, a list of attribute names over wUicn type-C*/ 
/* descriptors are to be defined, and a pointer to the AT. */ 
/* Output is a pointer to the record template created */ 
/* for this task. */ 



4.12.2 



4.12.3 



4.12.4 



4.12.5 

4.12.6 

4.12.7 

4.12.8 

4.12.9 

4.12.10 

4.12.11 

4.12.12 

4.12.13 



4.12.14 



scalar record_type , 
attr_name , 
descr_type , 
rectemppointer, 

dditpo inter , 



duplicate, 
matched , 
successful ; 



/* Type of records in this file. */ 
/* Attribute name. */ 
/* A, B, C, or NOTFOUND. */ 
/* Pointer to RTEMP created for */ 
/* this task. *7 
/* Pointer into the DDIT returned *7 
/* from the FIND function of AT. */ 
/* Indicator - TRUE or FALSE. */ 
/* Indicator - TRUE or FALSE. */ 
/* Indicator - TRUE or FALSE. */ 



/* Attrlist to be returned from GET_ATTRIBUTE_CHARACTERISTICS */ 
/* procedure has the form: */ 

attrlist = (attr_name, attr_data_type } attr_format, */ 

attribute_charactenstics) ; 



record_type := inpointer .record_type ; 

/* Invoke the CREATE function of module RTEMPM to */ 

/* create an record template for this task. A *7 
/* pointer to the template is returned. */ 

perform RTEMPM$CREATE(record_type, rectemppointer) ; 

if create is not successful 
~tK*en 

display message; 

else 

bQgi n 

while more attributes to be defined do 
begin 

get attr_name from terminal; 

/* Check to see whether this attribute name is */ 
/* unique within this record. */ 

perform RTEMPM$DUPCHECK( rectemppointer , 



/* If it is unique, then get the characteristics*/ 
/* of the attribute. */ 



at t rename . 
duplicate) 



4.12.15 if duplicate is TRUE 

4.12.16 tHen 

4.12.17 display message 

4.12.18 else 

4.12.19 Isegin 

/* Get attribute characteristics, and return a */ 
/* list of attribute characteristics, attrlist. */ 
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4.12.20 



4.12.21 



4.12.22 



4.12.24** 

4.12.25 

4.12.26 

4.12.27 



4.12.28 

4.12.29 
4.12.30** 

4.12.31 

4.12.32 

4.12.33 

4.12.34 



4.12.35 



4.12.36 

4.12.37 

4.12.38 

4.12.39 

4.12.40 

4.12.41 



perform GET_ATTRIBUTE__CHARACTERISTICS 

(attr name, 
attrTiat); 

/* Determine whether there is a descriptor defined */ 
/* over this attribute name. If there is, mark the*/ 
/* record template entry to indicate type. This */ 
/* mark will be used in DERIVEJDIRFXTORYJCEYWORDS */ 
/* procedure. */ 

perform ATM$FIND(attrjaame , 

<Tescr_type , 
dditpointer ) ; 

if a descriptor has been defined 
for this attribute name 

/* descr type not * NOTFOUND */ 
then . “ 

attrlist + descr__type; 

else 

begin 

perform S EARCH_TYP E- C_ATTR NAME 

( inpointer . type-IT_af:t renames , 
attr name, 
matched) ; 

if matched is TRUE 
then 

attrlist + 'C' ; 

else 

attrlist + nullcharacter ; 
end begin ; /* Not type A or B, */ 
end if ; /* If descriptor defined. */ 

/* Add information about this attribute to the */ 

/* template. */ 

perform RTEMPM$INSERT(rectemppointer , 

attrlist , 
successful) ; 



if successful is FALSE 
then 

display message: 
end if : 

end while : /* While more attributes to be defined. */ 

end begin ; 



/* Allow the user to review the entire template. */ 



4.12.42 perform REVIEW__RECTEMP(rectemppo inter) ; 

4.12.44 end if ; 



4.12.45 end proc ; 



4.14.1 proc DERIVEJDIRECTORYJCEYWORDS /* DRVKWORD (DBL113) */ 

( input : inpointer (clustering level , 
record_type , 
database_name , 
type-C_attr_names , 
atpointer , 
rectemppo inter ) ) ; 



/* Input is a pointer to a list of arguments which contains */ 
/* clustering level ("FILE" or "DATABASE"), record type, */ 
/* database name, type-C_attr_names , (a list of attribute */ 
/* names over which type-C descriptors are to be defined), a*/ 
/* pointer to the AT and a pointer to the RTEMP for this */ 
/* rile. */ 
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/* The procedure reads records from the source file(s). */ 
/* For each record, the set of descriptors from which the */ 
/* record is derivable is determined, the record is format- */ 
/* ted into the form required for storage in the database, */ 
/* and a count of descriptors , # the descriptor list, and */ 
/* the formatted record are written to a rile for sorting. */ 



4.14.2 



4.14.3 



4.14.4 



4.14.5 

4.14.6 



scalar descr_id, 

descr_count , 
f ield^count , 
f ield_value, 
filename ; 



/* Descriptor id. */ 
/* of descriptors derived. */ 
/* Indicates nth field of record.*/ 
/* Value of nth field. # */ 
/* Name of current input file. */ 



array descriptor_ids ; /* List of descriptors derived.*/ 



predicate = (attribute, ,, = n value); /* Equality predicate */ 

/* to test derivation or keyword.*/ 



input_record = ( f ield_value$f ield_value$ ... field_value ); 

rectemp_entry = (attr_name, attr_data_type , lower_bound, 

upper bound, descr ind;; /* Entry re- */ 
” 7 * trieved. from RTEMP. */ 



4.14.7 

4.14.3 

4.14.9 

4.14.10 

4.14.11 

4.14.12 

4.14.13 

4.14.14 

4.14.15 

4.14.16 

4.14.17 

4.14.18 



4.14.19 

4.14.20 



4.14.21 



4.14.22 

4.14.23 

4.14.24 



4.14.25 

4.14.26 



open output file for records to be sorted 

while more files of input data do 
begin 

get filename from terminal; 
open file filename; 

while more records in file do 
begin 

/* Initialize counts, get first record. */ 
descr_count := 0; 
field_count := 0; 

get input_record from file filename; 

while more fields in record do 
begin 

/* Get a field value from the record. */ 

field_value := next field_value from input_record ; 
field_count := field_count + 1; 

/* Get the entry from the record template */ 

/* which contains attribute name and char-*/ 

/* acteristics of that field. */ 

perform RTEMPM$GETENTRY( inpointer . rectemppo inter , 

f ield_count , 
rectemp_entry ) ; 

if rectemp entry. descr ind not = null 
Then 

begin 

/* Build a keyword predicate to be used to test */ 
/* whether current attribute-value pair can be */ 
/* derived from any descriptor. */ 

predicate. attribute := rectemp_entry .attr_name; 
predicate. value := field_value; 

/* Determine whether keyword is derivable. */ 

/* Descriptor id will be updated by the */ 

/* DERIVE procedures called below. If the */ 
/* descriptor id is null, the keyword is not */ 
/* derivable. */ 



m 
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4.14.27 
** 

4.14.28 

4.14.29 



if rectemp^entry .descr ind = 'x' 

7^ 'x' indTcates that Tescriptor is type A or B .*/ 
then 

perform DERIVE FROM_A_OR_BJDESCR 
(predicate , 
inpointer .atpointer , 
descr_id) ; 



4.14.30 

4.14.31 

4.14.32 



else 

perform DERIVE FROM_C_DESCR 
(predicate , 
mpo inter .atpointer , 
descr_id) ; 

end if ; 

/* If it is derivable, insert the corresponding */ 
/* descriptor id into the list of such ids being*/ 
/* built for this record. */ 



4.14.33 

4.14.34 

4.14.35 

4.14.36 

4.14.37 

4.14.38 

4.14.39 

4.14.40 

4.14.41 

4.14.42 

4.14.43 

4.14.44 



4.14.45 

4.14.46 



if keyword predicate is derivable 
tTTen 

begin 

perform PUTJDESCR_ID_INTO_LIST 

(descr^_id , 
descrTp torpids 
descr_count ) ; 

descr_count := descr_ count + 1; 
end if ; 

end if ; 

end while ; /* more fields loop */ 

perform BUILD_SORT_RECORD(descr_count , 

descr iptor — ids , 
input__record , 
rectemppo inter) ; 

end while ; /* records in file loop */ 
close file filename; 

end while ; /* files to be sorted loop */ 

/* Review the list of type-C attribute names, and create */ 

/* null AT entries where no descriptors have been defined.*/ 

perform RE VI EW_TYP E- C_ATTR_NAME S 

( type-C_attr_names , 
atpointer ) ; 

close file for sort records; 



4.14.47 end proc ; 



9.10.1 proc PROCESS A CLUSTER /* PROCLUST (DBL141) */ 

" Tinput: record, 

system_inf o , 
critpointer ) ; 

/* Process a cluster for loading into the database store. */ 
/* Input is the first record of a cluster, some system */ 

/* information, and a pointer to the CDT created for this */ 
/* task. */ 

/* Records have the form; # # */ 

/* record = (descr_count , descr iptor_ids , database^ record )*/ 
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** /* System information is a list of backend addresses. */ 

** /* system_info = (number of backends, */ 

** /* backenE_addresses , */ 

/* track_capacity) ; */ 



9.10.1 scalar next_ backend_index , 
capacity_remaining , 

cluster number: 



/* Index into array of back-*/ 

/* end addresses in system info*/ 
/* Capacity remaining in tEe*/ 

/* track or the next backend*/ 

/* at which records of the */ 

/* current cluster are to */ 

/* be stored. Used to update*/ 

/* CINBT. */ 



/* Randomly select a backend at which to begin distribution of */ 
/* records in the first cluster, and set the track capacity. */ 

9.10.2 perform GET_RANDOM_BACKEND_START 
** (system_info. number of_backends, 

next_backend — indexT; 



9.10.3** capacity := system_inf o. track_capacity ; 



/* Generate a Cluster Definition Table entry for the new cluster. */ 
/* The following procedure returns the cluster number. */ 

9.10.4 /* From somewhere as yet undefined, get a new cluster number. */ 



9.10.5 

** 



perform 



CDTM$INSERTNEWCLUSTER 

( 



record.descriptor_ids 
cdtpo inter , 
cluster_number ) ; 



y 



** 

** 

irk 



/* Physically distribute the data over the multiple */ 
/* backends according to the selected data placement */ 
/* strategy. The DISTRIBUTE__RECORDS procedure, starting*/ 
/* at a randomly selected backend, evenly distributes */ 
/* data across the backends in track-size lots. The */ 
/* address of the next backend and the amount of stor- */ 
/* age available there are returned from the procedure */ 
/* to be used to update the CINBT. Note that the */ 
/* records are read ahead, so that upon return from */ 
/* the DISTRIBUTE_RECORDS procedure, the first record */ 
/* of a new cluster will have replaced the record ori- */ 
/* ginally passed. */ 



9.10.6** 






perform 



DISTRIBUTE_RECORDS( record , 

next__backend_index , 
svstem_info , 
cluster_number , 
capacity_remaining) ; 



/* Update the Clus ter-ID-to-Next-Backend-Table with the address*/ 
/* of the next backend into which records belonging to this */ 
/* cluster should be inserted, and the remaining capacity at */ 
/* that backend. */ 

9.10.7 perform CINBTM$UPDATE(cluster_number , 

next_ backend_index . 
capacity_remaining) ; 



end proc ; 



9.10.8 
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Fourth Level Specifications for Database Load 



4.10.16.1 proc DEF_TYPEA_DESCR /* TYPEADEF (DBLllll) */ 

( input : atpo inter ) ; 

/* Define all type A descriptors. Input is a pointer to */ 
/* the instance of AT created for this task. */ 



4.10.16.2 scalar attr_data_type , 

descr_id, 
descr_type , 
attr^name, 
duplicate , 
dditpointer ; 



/* Character, integer, etc. */ 
/* Descriptor id. */ 
/* A, B, C, or NOTFOUND. */ 
/* Attribute name. */ 
/* Indicator, TRUE or FALSE. */ 
/* Pointer into the DDIT, either */ 
/* to first descriptor defined */ 
/* for this attribute or to last */ 
/* descriptor inserted. */ 



4.10.16.3 



4.10.16.4 



descriptor = ( lower^bound , upper bound); 

/* An "other" descriptor will De "defined for each attribute */ 
/* over which descriptors are defined, to represent all those */ 
/* keywords which are not deriveable from any other descriptor*/ 
/* defined for that attribute. */ 
other_ descriptor = ( lower_bound , upper_bound) ; 



4.10.16.5 

4.10.16.6 

4.10.16.7 

4.10.16.8 

4.10.16.9 

4.10.16.10 



4.10.16.11 

4.10.16.12 

4.10.16.13 



4.10.16.14 

4.10.16.15 

4.10.16.16 

4.10.16.17 



/* Initialize "other" descriptor bounds. */ 
other__descr iptor . lower bound = null; 
other_ descriptor .upperT* 01111 ^ = null; 

get attr_name from terminal; 

get attr_ data_type from terminal; 

while more descriptors for this attribute do 

begin 

/* NOTE: Limits supplied for the descriptor must */ 

/* be right- jus tified , padded on left. */ 

get descriptor . lower bound from terminal; 
get descriptor .upper^ound from terminal; 

check upper and lower bounds to insure that data is 
of tne correct type; 

if data not of correct type 
Then . 

display an informational message; 

else 



4.10.16.18 
4.10.16 .19** 

4.10.16.20 

4.10.16.21 



4.10.16.22 

4.10.16.23 

4.10.16.24 

4.10.16.25 



4.10.16.26 



4.10.16.27 

4.10.16.28 

4.10.16.29 



begin 

duplicate false; 

descr_type null; 

perform ATM$FIND(attr__name , 

dditpointer , 

pointer to descr_type); 

if this attribute name found in AT 

/* descr__type not = NOTFOUND */ 

then 

begin 

perform DDITM$DUPCHECK 

(descriptor , 
dditpointer , 
duplicate) ; 



if this descriptor exactly duplicates 
another or tne range overlaps another 
/* duplicate is TRUE */ 
then 

display message; 

else 

/* Get descr id from somewhere as yet */ 
/* undefined. */ 
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4.10.16.30 



4.10.16.31 

4.10.16.32 

4.10.16.33 

4.10.16.34 



4.10.16.35 



4.10.16.36 



4.10.16.37 



4.10.16.38 

4.10.16.39 

4.10.16.40 

4.10.16.41 



else 



perform DDITM$INSERT(descriptor , 

dditpo inter , 
descr_id) ; 

end if 
end begin 

"begin 

/* Insert type "other” descriptor for each new */ 
/* attribute name added to the AT. */ 

/* Get descr id from somewhere as yet undefined.*/ 
perform DDITFT$INSERT(other_descriptor , 

dditpointer , 
descr id); 

/* Now insert the new attribute name. */ 

perform ATM$INSERT(attr^name, 

TA' , 

dditpointer): 

/* Now insert the new descriptor defined here. */ 
/* Get descr id from somewhere as yet undefined.*/ 
perform DDITH$INSERT(descriptor , 

dditpointer, 
descr_id) ; 

end begin ; 

end if ; /* ( in AT) */ 

end begin ; 

end if' ; /* (not of correct type) */ 



4.10.16.42 end while ; 

4.10.16.43 end proc ; 



4.10.19.1 proc DEFJTYPEBJDESCR /* TYPEBDEF (DBL1112) */ 

(input: atpo inter ) ; 

/* Define all type B descriptors. Input is a pointer to */ 
/* the instance of AT created for this task. */ 



4.10.19.2 scalar attr data type, 

descF_id,~ 
descr_type , 
at t rename , 
duplicate , 
dditpointer ; 



/* Character, integer, etc. */ 
/* Descriptor id. */ 
/* A, B, C, or N0TF0UND. */ 
/* Attribute name. */ 
/* Indicator, TRUE or FALSE. */ 
/* Pointer into the DDIT, either */ 
/* to first descriptor defined */ 
/* for this attribute or to last */ 



4.10.19.3 descriptor = ( lower^bound , upper_bound ) ; 

/* An "other" descriptor will be defined for each attribute */ 
/* over which descriptors are defined, to represent all those */ 
/* keywords which are not deriveable from any other descriptor*/ 
/* defined for that attribute. */ 



4.10.19.4 other_descriptor = ( lower_bound , upper_bound) ; 



4.10.19.5 

4.10.19.6 

4.10.19.7 

4.10.19.8 



4.10.19.9 

4.10.19.10 

4.10.19.11 

4.10.19.12 



/* Initialize "other” descriptor bounds. */ 
other_descriptor . lower bound = null; 
other_descriptor . upper3>o un( i = null; 

get attr_name from terminal; 

f et attr_data_type from terminal; 

* NOTE: Limits supplied for the descriptor must */ 

/* be right- justified , padded on left. */ 

while more descriptors for this attribute do 
begin 

descriptor . lower__bound := null; 

get descriptor .upper_bound from terminal; 

check upper bound to insure that data is 
of tne correct type; 



4.10.19.13 
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4.10*19.14 

4.10.19.15 

4.10.19.16 

4.10.19.17 

4.10.19.18 
4.10.19.19* 

4.10.19.20 

4.10.19.21 



4.10.19.22 

4.10.19.23 

4.10.19.24 

4.10.19.25 



4.10.19.26 

4.10.19.27 

4.10.19.28 

4.10.19.29 



4.10.19.30 



4.10.19.31 

4.10.19.32 

4.10.19.33 

4.10.19.34 



4.10.19.35 



4.10.19.36 



4.10.19.37 

4.10.19.38 

4.10.19.39 

4.10.19.40 

4.10.19.41 



if data not of correct type 
tTTen . 

display local message; 

else 



"begin 

duplicate = false: 
descr_type : = null; 
perform ATM$FIND( attr_name , 

dditpointer . 

pointer to aescr_type); 



if this attribute name found in AT 
Then 

begin 

perform DDITM$DUPCHECK(descr iptor , 

dditpointer, 
duplicate) ; 

if this descriptor exactly duplicates 

another or the range overlaps another 

then 

display message; 

e lse 

/* Get descr__id from somewhere as yet */ 

/* undefined. */ 

perform DDITM$INSERT( descriptor , 

dditpointer , 
descr_id) ; 

end if ; 
end b"egin ; 

else 

A* Insert type "other" descriptor for each new 
/* attribute name added to the AT. 

/* Get descr id from somewhere as yet undefined. 
perform DDlTTT$INSERT(other_descriptor , 

dditpo inter , 
descr id); 

/* Now insert the new attribute name. */ 
perform ATM$INSERT( attr_name , 

B , 

dditpointer); 

/* Now insert the new descriptor defined here. 

/* Get descr id from somewhere as yet undefined. 
perform DDlTH$INSERT(other_descriptor , 

dditpointer , 
descr_id) ; 

end begin ; 

end if /* (in AT) */ 

end beg in 

end if ; /* (not of correct type) */ 



4.10.19.42 end while ; 



4.10.19.43 end proc ; 



4.10.21.1 proc LIS T_TYP E- C_ATTR_NAMES /* TYPECLST (DBL1113) */ 

( input : type-C_attr_names , 
atpointer); 

/* List all the attribute names over which type-C descriptors */ 
/* are to be defined. Input is a list for attribute names */ 

/* over which type-C attributes are to be defined, and a */ 

/* pointer to the AT. */ 



4.10.21.2 scalar index, /* Index to list of attribute names. */ 

attr_ name , 

** duplicate, /* Indicator - TRUE or FALSE. */ 

dditpointer,/* Pointer into DDIT returned from ATM*/ 
/* FIND function. */ 

descr_type; /* A, B, C, or NOTFOUND. */ 



* * * * * 
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4.10.21.3 index := 1; /* Null indicates end of list. */ 

4.10.21.4 type-C_attr_names [ index] := null; 



4.10.21.5 

4.10.21.6 

4.10.21.7 

4.10.21.8 



4.10.21.9 



4.10.21.10 

4.10.21.11 

4.10.21.12 

4.10.21.13 
4.10 .21 .14** 
4.10.21.15 



4.10.21.16 

4.10.21.17 

4.10.21.18 

4.10.21 .19 

4.10.21.20 

4.10.21.21 

4.10.21.22 

4.10.21 .23 



while more type-C descriptors do 
begin 

get attr_name from terminal; 
perform ATM$FIND(attr name, 

cTditpointer , 

pointer to descr — type); 

if a type-A or type-B descriptor is already defined 
over this attribute name 
/* descr^type not = NOTFOUND */ 

then 

display error message; 
else 

Fegin 

duplicate = FALSE; 
perform S EARCH_TYP E- C_ATTR_NAME S 
( type-C_attr_names , 
att rename, 
duplicate) ; 

if duplicate is FALSE 
tTTen 

type-C_attr names [index] := att rename; 
index := inTex + 1; 
type-C_attr_names [ index] := null; 
end if ; 
end if ; 



4.10.21.24 end while ; 



4.10.21.25 end proc ; 



4.10.22.1 proc REVIEWJ)ESCRIPTORS /* REVDESCR (DBL1114) */ 

( input : type-C_attr_names , 
atpointer) ; 



/* S T U B */ 



4.10.22.? end proc ; 



4.12.19.1 proc GETJ1TTRIBUTEJCHARACTERISTICS /* ATTRCHAR (DBL1121) */ 

(input: attr_name, attrlist); 



/* Get characteristics of an attribute for and entry */ 
/* in the record template. */ 

/* Input to the procedure is an attribute name and a */ 
/* list for attribute characteristics. */ 

/* The values of those characteristics will be col- */ 
/* lected in this procedure. */ 

/* Attribute list has the form: */ 

/* attrlist = (attr name, 

/* value_3ata_type , 

/* value_format , 

/* value_charl. 

/* value__char2) ; 



Attribute name. */ 

String , integer, float . .*/ 
Fixed or variable (string) */ 
First characteristic. */ 

Second characteristic. */ 



4.12.19.2 

4.12.19.3 



attrlist . attr_name :** attr_jiame; 

get attrlist . value_data_type from terminal; 



4.12.19.4 



case attrlist . value_data_type value 



4.12.19.5 

4.12.19.6 

4.12.19.7 



integer : 
begin 

attrlist .value — format := null; 
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4.12.19-8 

4.12.19.9 

4.12.19.10 

4.12.19.11 

4.12.19.12 

4.12.19.13 

4.12.19.14 

4.12.19.15 

4.12.19.16 

4.12.19.17 

4.12.19.18 

4.12.19.19 

4.12.19.20 

4.12.19.21 

4.12.19.22 

4.12.19.23 

4.12.19.24 

4.12.19.25 

4.12.19.26 

4.12.19.27 

4.12.19.28 



get attrlist .value__charl ; 
get attrlist . value_char2 ; 
end begin ; 



/* Min value. */ 
/* Max value. */ 



string : 
begin 

/* Fixed or variable length string ? 
get attrlist. value format from terminal; 
if attrlist . value_Tormat is fixed 
Then 

]je£in 

/* Min length « 0; get max length, 
attrlist .value charl :* 0: 
get at tr list . value_char2 from terminal; 
end begin 
else 

|>egjji 

/* Get min and max lengths. 

get attrlist .value_charl from terminal; 

t et at trlist . value__char2 from terminal; 
if ; 

end beg in ; 
float: ; 
otherwise: ; 
end case 



*/ 



*/ 



4.12.19.29 end proc : 



4.12.27.1 proc SEARCHJTYPE-C_ATTR NAMES /* SRCHCLST (DBL1122) */ 

( input : type-^T_attr_names , 
attr — name, 

** output: found); 

/* Search the list of attribute names over which type C */ 

/* descriptors are to be defined to determine whether */ 

/* attr_name is a duplicate. Input is a list of attri */ 

/* bute names over wnich type-C descriptors are to be */ 

/* defined, and an attribute name. */ 



4.12.27.2 


scalar 


index, /* Index into 


list 


of attribute names.*/ 


** 




found; /* Indicator, 


TRUE 


or FALSE. */ 


4.12.27.3 


index 


:= 1; 






4.12.27.4** 


found 


:= false; 






4.12.27.5 


while 


type-C_ attr^names [ index] 
/* null indicates end 


not 

of 


= null 
list */ 






and 






4.12.27.6 




type-C_attr_names [ index] 


not 


= attr_name do 


4.12.27.7 


index := index + 1; 






4.12.27.8 


end while; 






4.12.27.9 


if type-C attr names [index] = 


attr 


_name 


4.12.27.10 


then 








4.12.27.11** 


found := true; 






4.12.27.12 


end if 


) 






4.12.27.13 


end proc; 









4.12.42.1 proc REVIEW_RECTEMP /* REVRTEMP (DBL1123) */ 

( input : rectemppointer ) ; 



/* S X U B */ 



4.12.42.? end proc ; 
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4.14.29-1 proc DERiyE_FROM_A_OR_B_DESCR /* DRVAORB (DBL1131) */ 

(input: predicate, atpointer, 
output: descr_id); 

/* Determine whether there exists a type A or type B descriptor */ 

/* from which the current keyword can be derived. */ 

/* Input is an equality predicate, and a pointer to the AT. */ 

/* A predicate has the form: */ 

/* predicate = (attribute, value) */ 

/* A descriptor id is returned to the calling procedure. */ 

4.14.29.2 scalar descr_id, 

dditpointer, /* Pointer into DDIT returned from */ 
/* FIND function of ATM. */ 

descr_type; /* A, B, C, or NOTFOUND. */ 



4.14.29.3 descr_id := null; 

/* FIND returns a pointer to first descriptor defined for */ 
/* this attribute name. */ 

4.14.29.3 perform ATM$FIND(predicate. attribute, 

dditpointer , 

pointer to descr_type); 

/* DERIVE returns the descriptor id for any descriptor from */ 
/* which this keyword can be derived. (May be null) */ 



4.14.29.4 perform DDITM$DERIVE(predicate , 

dditpointer , 
descr_id) ; 



4.14.29.5 end proc ; 



4.14.31.1 proc DERIVE_FROM_C_DESCR /* DRVC (DBL1132) */ 

(input: predicate, atpointer, 
output: descr_id) 

/* Determine whether keyword can be derived from an existing */ 
/* type-C descriptor. If not, define a new type-C descriptor. */ 
/* In put is an equality predicate, and a pointer to the AT. */ 
/* A predicate has the form: */ 

/* predicate 88 (attribute, value) */ 

/* A descriptor id is returned to the calling procedure. */ 



scalar descr_id, /* Descriptor id returned from DERIVE */ 

/* function of DDITM. */ 

dditpointer, /* Pointer into DDIT returned from */ 
/* FIND function of ATM. */ 

keep_ddit_jptr . /* Save pointer returned fromFIND */ 

/* to compare with that returned by */ 
/* INSERT function of DDITM. */ 

descr_type; /* A, B, C, or NOTFOUND. */ 

descriptor = ( lower_bound , upper_bound ) ; 



4.14.31.2 



4.14.31.3 



4.14.31.4 dditpointer := null; 

/* FIND returns pointer to the first descriptor defined for */ 
/* this attribute name in DDIT. */ 



4.14.31.5 



perform ATM$FIND( predicate . attribute , 

dditpointer . 

pointer to aescr_type); 
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4.14.31 .6 



4.14.31.7 

4.14.31.8 

4.14.31.9 

4.14.31.10 

4.14.31.11 

4.14.31 .12 
** 

4.14.31.13 



4.14.31.14 

4.14.31.15 

4.14.31.16 

4.14.31.17 

4.14.31.18 



4.14.31.19 

4.14.31.20 

4.14.31.21 

4.14.31.22 
** 

4.14.31.23 

4.14.31.24 
4.14.31.25** 

4.14.31.26 

4.14.31.27 

4.14.31.28 

4.14.31.29 



descr_id : = NULL; 

/* If this is the first type-C descriptor defined for this */ 
/* name, first insert the descriptor into DDIT, then put */ 

/* an entry in the AT with a pointer to that descriptor. */ 

/* If this is not the first type-C descriptor: defined for */ 
/* this attribute name, check to see whether this descrip- */ 
/* tor already exists. If it does, use the existing id; */ 
/* otherwise, insert the new descriptor into DDIT. */ 

if no descriptors yet defined for this attribute 
TFen 

begin 

des crip tor. lower bound =» null; 
descriptor .upper“""bound = predicate .value ; 

/* Get descr id Trom somewhere as yet undefined. */ 
perform DDlTH$INSERT(descriptor , 

dditpointer, 
descr_id) ; 

perform ATM$INSERT£gredica"te . attribute , 
dditpointer) ; 

end begin 

else /* Descriptors previously defined for this attr.*/ 
begin 

keep dd i t d t r := dditpointer; 

/* Does tTiis descriptor already exist? k / 
perf orm DDITM$DERIVE(predicate 

dditpointer , 
descr_id); 

/* If not, add it. 
if keyword is not derivable 
I* This is a new descriptor 
then 

begin 

/* Get descr id from somewhere as yet undefined. */ 
perform DDITH$INSERT(descriptor , 
dditpointer , 
descr_id); 

if keep_ddit_ptr != dditpointer 
"then 

perform ATM$UPDATE( predicate . attr ibute , 

dditpointer) ; 

end begin 
end if ; 
end begin ; 
end TF ; 



*1 

*/ 



4.14.31.30 end proc ; 



4.14.36.1 proc PUT_DESCR_ID_INTO_LIST /* PUTINLST DBL1133) */ 

(input: descr_ id, descriptor_ids , descr_count ) ; 

/* Insert a new descriptor id into the list of descriptor ids */ 
/* from which the current record can be derived. */ 

/* Input is a descriptor id, a list of descriptor ids, and a */ 
/* count of the number of items in the list. The list must be*/ 
/* a new id to be inserted into the list. The list must be */ 
/* maintained in ascending sequence. */ 



4.14.36.2 insert descr_id in order into list of descriptor ids 

4.14.36.3 end proc ; 
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4.14.41.1 proc 



BUILD_SORT_ RECORD /* BLDSRT (DBL1134) */ 

( input : descr_count , 

descnptor_ids , 
input_record, 
rectemppo inter) ; 



4.14.41.2 



sortrec = (descr_count , descriptor__ids , database_record) ; 



4.14.41.3 sortrec.descr^count := descr__count ; 

4.14.41.4 sortrec. descrTptor_ids := descriptor_ids ; 

4.14.41.5 format input_record into sortrec. database_record; 

4.14.41.6 write sortrec to output file for sort 

4.14.41.7 end proc ; 



4.14.45.1 proc 



4.14.45.2 



REVIEW_TYPE-C ATTR_NAMES /* REVTYPEC (DBL1134) */ 
Tinput : type-C_attr_names , 
atpointer ) ; 

/* Review the list of attribute names over which type-C 
/* descriptors are to be defined. If no descriptors have yet 
/* been defined for an attribute, create an entry in AT with 
/* a null pointer in place of a pointer into DDIT. 

/* Input is a list of the attribute names over which type-C 
/* descriptors are to be defined and a pointer to the AT. 



scalar 



index, 

dditpointer , 
descr__type; 



*/ 
*/ 

* . 

*/ 
*/ 
*/ 

/* Index into list of attribute names.*/ 
/* Pointer from AT into DDIT. */ 

/* A, B, C, or NOTFOUND. */ 



4.14.45.3 

4.14.45.4 



4.14.45 

4.14.45 



.5 

.6 



4.14.45.7 



4.14.45 

4.14.45 

4.14.45 

4.14.45 



4.14.45 

4.14.45 



.8 

.9 

.10 

.11 



.12 

.13 



index : = i ; 

while type-C_attrnames [ index] not = null do 
/* null indicates end of list */ 
begin 

perform ATM$FIND( type-C_^attr_names [ index] , 
daitpo inter , 
pointer to descr_type); 

if not found 

T* descr_type = NOTFOUND */ 
then 
Ije&in 

dditpointer := null; 

perform ATM$INSERT^type-C_attr_names [ index] , 

C., 

dditpointer ) ; 

end if ; 
end while ; 



4.14.45.14 ^end_proc; 



9.10.2.1 proc GET_RANDOM_BACKEND_START /* GETRAND (DBL1411) */ 

(input: number of backends, 
output : random_index_to_backends ) ; 

/* Randomly select a backend at which to begin distributing */ 
/* data from the current cluster. Input is a pointer to an */ 
/* argument list which contains the number of backends in */ 
/* the system. Output is a random number generated within */ 
/* the range of 1 to number of backends. */ 

scalar random_index_to_backends ; 



9.10.2.2 



m 
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9.10.2.3 

9.10.2.4 end proc ; 



generate random_index to__backends 

within the range T to number_of_backends ; 



9.10.6.1 

** 

kie 

kk 

kk 



proc 



kk 

kk 

kk 

kk 

kk 

kk 

kk 



DISTRlBUTEJtECORDS 

( input/output 



/* DISTREC (DBL1412) */ 

record, 

next_backend__index, 
input : system_inf o , 

cluster number, 

output : capabiTity — remaining) ; 

/* Physically distribute the data over the multiple backends */ 

*'/ 

*/ 

*/, 

* / 

*/ 
*/ 

*/ 

k f 



I* according to the track-spl itt ing-with-random-placement 
/* strategy. 

/* Input is the first record of a cluster, a randomly gener- 
/* ated index into the list of backend addresses, some system 
/* information, including the list of backend addresses, and 
/* the cluster number. Output is the capability remaining in 
/* the track at which records added to tnis cluster are to be 
/* stored. This capacity, together with the index and the 
/* cluster number, will Be used to update the CINBT. Note 
/* also that records are read at this level, so that when the */ 
/* procedure terminates, the variable record will contain the */ 
/* first record of the next cluster. 

/* Records have the form: 

/* record=(descr_count . de sc rip torpids , database_record) . 

/* System__info has the form: 

/* system__inf o=(number of backends, 

/* backenH^aHdr esses 

/* track__capacity *, 

/* Cluster number is a character string. 



k/ 

*/ 

*/ 

*/ 

*/ 

*/ 



9.10.6.2 scalar capacity remaining, /* Capacity remaining on the */ 

/* track of the backend at */ 
/* which the next record of */ 
/* this cluster is to be */ 
/* stored. x / 

next__backend_index,/* Index to backend addresses*/ 



prev~descr_count , 
new_cluster ; 

9.10.6.3 record = (descr_count , 

descriptor_ids , 

database_record ) ; 

9.10.6.4 array prev_descr_ p ids ; 

9.10.6.5 array full__track; 



/* Count of descriptor_ids 
/* from the previous record, 
/* TRUE or FALSE. 



*/ 

*/ 



/* Number of descriptor__ids */ 

/* in the list following. */ 

/* List of descriptor ids */ 

/* from which this record may*/ 

/* be derived. */ 

/* Record in format required */ 

/* for storage. */ 

/* Descriptor ids from the */ 

/* previous record. */ 

/* Array in which to accumulate*/ 
/* a full track of records. */ 



9.10.6.6** capacity__remaining : = sy stem_inf o . track_capacity ; 

9.10.6.7 while more records in cluster do 

9.10.6.8 begin 

/* Accumulate a full track of data before distributing */ 
/* data to the next backend. When a track is distributed*/ 
/* increment the next backend index to point to the next */ 
/* backend address ancT reset The capacity_remaining . */ 
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9.10.6.9 

9.10.6.10 


if capacity^ remaining >= size(record) 
tFen 


9.10.6.11 

9.10.6.12 

9.10.6.13 


begin 

add record to full_track array; 
capac ity_ remaining : = 

capacity remaining - size(record) ; 


9.10.6.14 


end 


9.10.6.15 


else 


9.10.6.16 

9.10.6.17 


begin 

distribute cluster number, full track to 
backend at 

sy s t em_inf o • b ackend_add r e s s 

[next_ backend_index] ; 


9.10.6.18 


next_backend_index = 

(next_backend_index +1) 

mod (number_of_backends ; 


9.10.6.19 


capacity_remaining : = 

system into* track capacity; 


9.10.6.20 


end if ; 

/* Save descriptor count and list of descriptor ids from */ 
/* the current record for comparison with the next to */ 
/* detect cluster change. */ 


9.10.6.21 

9.10.6.22 


prev — descr_count :* record. descr count; 
prev_descr_ids := record. de s cr__icTs ; 

/* Read the next record */ 


9.10.6.23 


read a record from file of sorted records; 


9. 1C. 6. 24 


perform CHECK FOR NEW CLUSTER 

(record, 

prev_descr_ count , 
prev descr ids, 
new_cluster ) ; 


9.10.6.25 


end while; 


9.10.6.26 

9.10.6.27 

9.10.6.28 


if full track array is not empty 
TEen 

distribute cluster_number , full track to backend 
at system_info. backend address 

[next_backend_index] ; 


9.10.6.29 

9.10.6.30 


end if ; 
end proc; 



9.10.6.30 



Fifth Level Specification for Database Load 



9.10.6.25.1 proc CHECK FOR NEW CLUSTER /* NEWCLUST (DBL14121) */ 

(input: record, 

prev_descr_count , 
prev__descr__ids , 

** output: new_c luster) ; 

/* Check the list of descriptor__ids from the current */ 

/* record against the prev_descr_ids list from the */ 

/* previous record. If the lists are different lengths */ 
/* a new cluster is indicated. If the lists are the */ 
/* same length, compare them item by item to determine */ 
/* whether a new cluster is indicated. */ 

/* Records have the form: # # */ 

/* record=(descr_count , descriptor_ids , database_record)*/ 



9.10.6.25.2 



scalar index. /* Index to both lists of ids. */ 

new_cluster; /* Indicator, TRUE or false. */ 



9.10.6.25.3 

9.10.6.25.4 

9.10.6.25.5 

9 .10 .6 .25 .6** 

9.10.6.25.7 

9.10.6.25.8 

9.10.6.25.9 
9.10.6 .25.10** 

9.10.6.25.11 

9.10.6.25.12 



new_cluster := false; 
if record. descr__count not = 

• prev_descr__count 

then 

new_c luster : = true; 
else 

Begin 

index := 1; /* Set index. */ 

new cluster := false; 

whiTe index <= record. descr_count do 



9.10.6.25.13 



9.10.6.25.14 

9.10.6.25.15** 

9.10.6.25.16 

9.10.6.25.17 



if record. descriptor_ids [ index] not = 

prev_ descr_ids [ index] 

then 

new^cluster :=* true; 

else 

index : = index + 1; 



9.10.6.25.18 



end while ; 



9.10.6.25.19 end if 



9.10.6.25.20 endproc ; 



C.2 Part II 2 . Record Template Module 



/* 

/* 

/* 

I* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 



;i; 

2 

3 , 

4) 

. 5 ; 

( 6 ) 



Part II - Record Template Module 



(7) 



Record Template Module 
P. R. Strawser 
August 25, 1981 



*/ 
*/ 
*/ 
*/ 
*/ 
*/ 

Purpose: */ 

The record template module provides ser- */ 
vices for record template data structures.*/ 



Design: 
Designer: 
Date : 
Modified: 



A record template data structure is a 



/ 



tabular collection of information about the*/ 
records of one file, where all records are */ 
assumed to have the same format. Each */ 
template is identified by record type, and */ 
contains a count of entries and an entry */ 
for each field in the record. Each entry */ 
contains field name, data type, length in- */ 
ftprmation, and an indication or whether the*/ 
field might be a descriptor. */ 

*/ 

Output Data: */ 

A record template for a given file. */ 



(8) Procedure Structure for RTEMPM 



RTEMPM 

(Module) 



RTEMPM$ 

CREATE 



RTEMPM$ 

DUPCHECK 



RTEMPM$ 

GETENTRY 



RTEMPM$ 

INSERT 
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(10) Program Specifications 
module RTEMPM 

programs CREATE, DUPCHECK, GETENTRY, INSERT; 
data structures record template; 
end module 



proc RTEMP M__C REATE ( input : record_type, 

output : rectemppointer) ; 

/* Name a record template structure with the name record_type */ 

/* and initialize count of entries to zero. */ 

/* A record template has the structure: */ 

/* record_template = (count, entry [no_entries ]) ; */ 

/* An entry in the record template has the structure: */ 

/* entry = (attr_name, data_type, format, lenl , len2 , */ 

/* descr_ind); */ 

scalar rectemppo inter ; /* Pointer to record template structure. */ 



Allocate a record template data structure with the name 
record_type ; 

rectemppointer := pointer to allocated data structure; 
rec temppointer .count := 0; 



end proc ; 



proc RTEMPM__DUPCHECK( input : rectemppointer , 

attr name, 
output : dupTicate ) ; 

/* Check to see whether there is already an entry in this record */ 

/* template with an attribute name equal to the input attribute */ 

/* name. Input is a pointer to the record template and an attri-*/ 
/* bute name* Output is an indicator with a true or false value.*/ 
/* A record template has the structure: */ 

/* record_template = (count, entry [no_entries ]) ; */ 

/* An entry in the record template has the structure: */ 

/* entry = (attr_name, datatype, format, lenl, len2 , */ 

/* descr_ ind); */ 

scalar duplicate, /* Indicator with TRUE or FALSE value. */ 
counter; /* Local variable. */ 



counter : = 1 ; 
duplicate := false; 

while counter is less than or equal to rectemppointer . count 
& duplicate is false do ; 

if attr_name = rectemppointer . entry [counter ]. attr_name 
then duplicate := true; 
end while ; 



end proc ; 
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proc RTEMPM GETENTRY( input : rectemppointer , 

““ f ield_ number , 

output : rectemp_entry ) ; 

/* Get the entry indicated by field number from the record template */ 
/* pointed to by rectemppointer , ancT return the information to the */ 
/* calling procedure. */ 

/* A record template has the structure: . */ 

/* record_template 3 (count, entry [no_entries ]) ; */ 

/* An entry in the record template has the structure: */ 

/* entry = (attr name, data type, format, lenl , len2, */ 

/* descr_ind); */ 

/* A record template entry. */ 

rectemp_entry = (attr name. value_data type, value_format , 

vaTue_charl , value_char2 , descr_ind); 

if field_number greater than rectemppointer .count 
tTTen 

rectemp_entry := null; 

else 

rec temp_entry : = rectemppo inter . entry [ f ield_number ] ; 
end if ; 

end proc ; 

proc RTEMPM_INSERT( input : rectemppointer, 

rectemp_entry , 
output : successful) ; 

/* Insert an entry in the next available slot of the record template */ 
/* pointed to by rectemppointer. Output is an indicator indicating */ 
/* success or failure of the operation. */ 

/* A record template has the structure: */ 

/* record_template 3 (count, entry [no_entries ]) ; */ 

/* An entry in the record template has the structure: */ 

/* entry = (attr_name^ data_type, format, lenl, len2 , */ 

/* descr_ina); */ 

scalar successful; /* Indicator with TRUE or FALSE value. */ 



successful :* true; 

rectemppointer .count := rectemppointer .count + 1; 

if rectemppointer .count > maximum fields per record 
tTTen 



else 



end if ; 
end proc ; 



successful := false; 

rectemppo inter . entry [rectemppo inter . count ] = rectemp_entry ; 



! 
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APPENDIX D 

THE SSL SPECIFICATION FOR DIRECTORY MANAGEMENT 



The system specification for directory management is given in this ap- 
pendix. The specification consists of five parts: the top level of directo- 
ry management, one service abstraction, and three data abstractions. 

In Part I, the top level of directory management is specified. In Part 
II, the service abstraction employed in directory management is specified. 
This abstraction, known as directory interface, accepts the output of des- 
criptor search and produces the input for cluster search. The data abstrac- 
tions for attribute table, descriptor-to-descriptor-id table, and 
cluster-definition table are specified in Parts III, IV, and V, respectively. 



D.l Part 1^ - The Top Level of Directory Management 



/* 

/* 

/* 

/* 

/* 

/* 

/* 

/* 



(1) Part I 
\ 2 ) Design 

Designers 
Date 

Modified 



( 6 ) 



The Top Level of Directory Management 
DIRECTORY_MAN 
T,M, Ozsu, A, Orooii 
July 28, 1981 
Aug. 4. 1981 
Sept. 11, 1981 

Purppse : 

This is the directory management subsystem. The 



inputs are a 



*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 



/* pointer to a table which contains either tne keywords in a record or */ 
/* the predicates in a ‘query, either the number of keywords in the * / 
/* record or the number of predicates in the query, and a schedule */ 
/* number that is used in determining the range or keywords or */ 
/* predicates _ this backend is supposed to process. The output is either */ 
/* a cluster id (request type=insert) or a set of addresses (request */ 
/* type=non-insert ) , */ 
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(8) Procedure Hierarchy for DIRECTORY_MAN 



DIRECTORY MAN 



DIRECTORY MAN$ 
INS DESC 'SR 



DIRECTORY MAN$ 
NINS DESC SR 



_D|S 






DIRECTORY MAN$ 
INS CLUS GR 



+ 



DIRECTORY MAN 
NINS ADDR GR 






DIRINT$ DIRINT$ DIRINT$ 
CREATE DEFPRED BROADCAST 






DIRINT$ 

GET ALL DESC 



— + 



CDTM$ 

FIND SINGLE CLUS 



DIRINT$ 

NO DESC GR 



DIRINT$ 

NEXT DESC GR 



CDTM$ 

FIND ADDRESS 



+ 



(9) Data Structures 

/* The data structure definitions are included in the program */ 
/* specifications. */ 



(10) Program Specifications 

1. subsystem DIRECTORY MAN( input : inptr, number, schedule_no, 

output : (clustered, addresses}) ; 



2. set addresses; 

3. Find the Attribute Table of the current database, call it AT; 

4. _if request type is INSERT 

5. then begin /* inptr=recordptr ; number=no. of keywords */ 

6. pertorm INS_DESC_SR( inptr, number, schedule_no, AT); 

D o the descriptor search for the keywords in the record */ 

7. perform INS CLUS GR( inptr, cluster id); 

/* find "the cluster the record belongs to */ 

8. return ( c luster — id) ; 

9 . end begin 

10 . else begin 

/*n on- insert: inptr=queryptr : number=no. of predicates */ 

11. perfo rm NINS_DESC_SR( inptr, number, schedule_no, AT); 

T* Do the descriptor search phase for the predicates in */ 

/* the query */ 

12. perform NINS ADDR GR( inptr, addresses); 

/*" find the addresses of the records in clusters which */ 

/* may satisfy the query */ 

13. return( addresses; ; 

14. end iT 

15 . end subsystem ; 



6.1 proc INS DESC SR( input : record_ptr, no_keywords, schedule_no. AT); 

/* This procedure handles the insert cases. Given a record, tne number */ 

/* of keywords in the record, the schedule number and the Attribute */ 

/* Table, it computes the range of keywords it is supposed to handle */ 

/* and works on the keywords m that range. */ 

6.2 type := 'insert'; 
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6.3 calculate the range of keywords to work on; 

6.4 conjunc_no :® 1; 

6.5 keyword no := starting keyword number; 

6.6 perf ormn DIRINT$CREATErrequest id) ; /* create a new RDIT table */ 

6.7 while there are keywords in range do 

6 .8 Leg in 

6.9 loc parameter (conjunc._no) I I (keyword_no) ; 

/^location parameter consists of conjunction number */ 

/* concatenated with keyword number within that */ 

/* conjunction. In ins€:rt cases, cunjunction number is 1 */ 

6.10 pick next keyword; 

6.11 form an equality predicate; 

6 .12 perform DiRINT$DEFPRED( type, loc_ parameter, predicate, AT ) ; 

/* Find the descriptors that satisfy the predicate */ 

6.13 keyword no := keyword_no + 1; 

6.14 end while 

6.15 pVrToim D'IRINT$BROADCAST; 

/* Broadcast the descriptor ids to all the other backends */ 

6 .16 end proc ; 



11.1 proc NINS DESC SR( input : query_jtr, no__predicates, schedule_no. AT); 

/* This procedure handles non- insert cases. Given the query, tne */ 
/* total number of predicates in the query, the schedule number and */ 
/* the AT, it computes the range of predicates it is supposed to */ 
/* handle and works on the predicates in that range. We recall that */ 
/* each backend handles 1/n of the predicates, where n is number of */ 
/* backends. */ 



11 .2 

11.3 

11.4 

11.5 

11.6 

11.7 

11.8 

11.9 

11.10 
11.11 
11.12 
11 .13 

11.14 

11.15 

11.16 

11.17 

11.18 



type : = 'non-insert'; 

compute the range of predicates to be worked on; 
coniunct_no := first conjunction number in the range; 
predicate no := starting' predic ate number in the range; 
perform DTRI NT $C REATE ( reques t i d ) ; /* create a new RDIT table */ 
while there are conjunctions in tne range do_ 

pick next conjunction; 

do begin /* do for each predicate */ 

loc parameter := ( c onjunct_no ) I I (predicate_no ) ; 
perTorm DIRINT$DEFFRED( type, loc parameter , predicate, AT) ; 
predicate_no := predicate_no + T; 
until Cend of predicates in this conjunction) or 

(end of predicates in the range); 
conjunct_no := conjunct_no + 1; 
end while; 



perform" TfrRINT$BROADCAST 

T^Troadcast the descriptor ids to all the other backends 
end proc ; 



*/ 



7.1 proc INS CLUS GR( input : recordptr, output : cluster_id); 

r* This procedure finds the cluster to which the record being */ 
/* inserted belongs. If the descriptors of the record define a */ 

/* new cluster, it signals this to the controller. */ 

7.2 list descriptor id group; /* used internally for keeping */ 

/* descriptor-id group */ 

7.3 wait until RDIT tables are obtained from all backends; 

7.4 join all these RDIT tables into one RDIT table; 

7.5 perform DIRINT$GET ALL DESC (descriptor id g roup ) : 

/* Get the descriptor- id group for the recora being inserted */ 

7.6 per form CDTM$FIND SINGLE CLUStdescr ip tor id group, cluster_id); 

r* F ind the cTuster That the record being inserted belongs to */ 

7 .7 re turn (cluster id) ; 

T* If cluster is found, its id is returned. Otherwise a null */ 
/* value is returned. */ 

7 .8 end proc ; 
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12.1 proc NINS ADDR GR( input : queryptr, output : address list); 

/* This procedure finds the addresses of the record’s in this */ 

/* backend that may satisfy the query. */ 

12.2 scalar conjunct_no, no group » index : integer; 

12.3 list addresses; 

12.4 list descriptor id g roup; 

12.5 list cluster_nos; 

12.6 wait until RDIT tables are obtained from all backends; 

12.7 join all these RDIT tables into one RDIT table; 

12.8 coniunct_no := 1; 

12.9 while there are conjunctions in the query do 

12.10 begin 

12.11 ‘ perform DIRINT$NO DESC GR(con junct_no, no group); 

7* find the numFer o T descriptor-id groups for this */ 

/* conjunction. */ 

12.12 for index from 1 J^o. no group by 1 do. 

12.13 begin 

12.14 perform DIRINT$NEXT_DESC GR(conjunct no, 

"descriptor i? group ) ; 

/* Get the next descriptor-id group. */ 

12.15 perform CDTM$FIND_ADDRESS(descr ip tor id group. 

addresses) ; 

/* Find the addresses of the records. */ 

12.16 address list = address_list + addresses; 

/* AdcT the addresses found to the address list; */ 
/* caution: duplicates are eliminated. */ 

12.17 end for : 

12.18 conjunct no := conjunct no + 1 ; 

12.19 end while ; ~ ~ 

12.20 return CaFaress list) ; 

12.21 end proc ; 



D.2 Part II - The Service Abstraction (DIRINT) 

/* (1) Part II : The Service Abstraction 

/* (2) Design ; DIRINT 

/* (3) Designers : T.M. Ozsu. A. Orooji 

/* (4) Date : Aug. 4. 1981 

/* (5) Modified : Sept, ll, 1981 

/* (6) Purpose : 

/* This is the service abstraction employed in directory 

/* management. This abstraction, known as directory interface, 
/* accepts the output of descriptor search and produces the 
/* input for cluster search. 



*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 



(8) Procedure Hierarchy for DIRINT 
DIRINT 



-t 

1 








j 






1 

DIRINT$ 


DIRINT$ 


DIRINT$ 


1 

DIRINT$ 


DIRINT$ 


DIRINT$ 


CREATE 


DEI 


'PRED 


BROADCAST 


GET_ALL_DESC 


N0_DESC_GR 


NEXT_DESC_GR 


+ 








— t 







ATM$ DDITM$ DDITM$ DDITM$ 

FIND CDERIVE DERIVE INSERT 
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(9) Data Structures 

/* The data structure definitions are included in the program */ 
/* specifications. */ 



(10) Program Specification 



mod DIRINT 

programs DEFPRED, GET_ALLJ)ESC, NO DESC GR, NEXT_DESC_GR, BROADCAST 
datasets' request_descriptor_id_tabTe (RUIT) 

7*“ A table of ( loc_parameter , descriptor id) pairs for */ 

/* the present request */ 

end mod 



6.12.1 proc DEFPREDC input : type, loc parameter , predicate, AT); 

T* This procedure finds all "the descriptors that satisfy */ 
/* a predicate. */ 



6.12.2 list desc ids; 



/* list of descriptor ids satisfying */ 
/* the predicate */ 



6.12.3 perform ATM$FIND(AT t attribute, dditptr, descr iptor_type ) ; 

| /* F ind the pointer to DDIT entry for the given attribute */ 

6.12.4 i£_ search successful 

6.12.5 then begin 

6.12.6 if (type = 'insert') and (descriptor type = 'C') 

6.12.7 then begin 

6.12.8 perform DDITM$CDERIVE( predicate, dditptr, desc_ids ) ; 

6.12.9 if keyword not derivable 

6.12.10 then begin /* a new type-C descriptor */ 

6.12.11 new_desc_id := a new descriptor id; 

/* Give this descriptor a new id */ 

/* and insert it into DDIT */ 

6.12.12 perform DDITM$INSERT(descriptor, 

new_jiasc_id* dditptr 2 ) : 

/* insert the new descriptor into DDIT */ 

6.12.13 value (RDIT 1 loc parameter) := new_desc_id; 

/* Insert the new descriptor id into RDIT 

6.12.14 end begin 

6.12.15 else 

6.12.16 value (RDIT, loc parameter ) := desc ids; 

6.12.17 end if 

6.12.18 end begin 

6.12.19 else begin 

6 .12 .20 perf orm DDITM$DERIVE( predicate, dditptr, desc_ids) ; 

/* Find those descriptors from which this */ 

/* predicate is derivable and put their ids */ 
/* into the desc_ids list */ 

6.12.21 value (RDIT. loc parameter) := desc_ ids; 

T* add a new pair for each descriptor id in */ 

. /* desc_ids list */ 

6.12.22 end if 

6.12.23 end if 

6.12.24 end proc ; 



7.5.1 proc GET ALL DESC ( output : descriptor id group): 

T* This procedure gets the descriptor ids of the descriptors */ 
/* from which the keywords in a record have been found to be */ 
/* derivable. */ 

7.5.2 descriptor id group :* null; 

7.5.3 djo /* colTeci all descriptor ids */ 

7.5.4 descriptor id group :* descriptor id group + 

descriptor id at current location of RDIT; 

7.5.5 until (end of RDIT table); 

7 .5 .6 return (descriptor id group ) ; 

7.5.7 end proc ; 
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12.11.1 proc NO DESC GR( input : conjunct__no, output : no group); 

T* XhTs procedure finds the number of descriptor- id groups for */ 
/* the given conjunction. Furthermore, it initializes certain */ 
/* arrays and counters that will be used by NEXT_JDESC_GR */ 

12.11.2 array counter. desc_per_pred : integer; /* global arrays */ 

12.11.3 scalar pred no. index : integer; 



12.11.4 find the beginning of predicates for conjunct^ no in RDIT; 

12.11.5 pred_no : = I: 

12.11.6 no group := 1; 

12.11.7 inHex := 1; 

12.11.8 do begin /* initialize desc per__pred array */ 

12.11.9 aesc_per_pred( index) := U; 

12.11.10 index := index + 1; 

12.11.11 until (end of desc per_pred array); 

12.11.12 do begin /* Calculate number or descriptors for all */ 

/* predicates in the given conjunction. */ 

12.11.13 loc_param := conjunct_no IT pred_no; 

12.11.14 do /* calculate no. of descriptors for one predicate */ 

12.11.15 pick next RDIT entry; 

12.11.16 aesc_per__pred(pred_no) : = desc__per _pred(pred_no) + 1; 

12.11.17 until (loc param ~= location parameter in RDIT) jor 

(end of RDIT); 

12.11.18 no group := no group * desc per_pred( pred no); 

/* keep a running total ofnumber of“~*/ 

/* descriptor-id groups */ 

12.11.19 pred_no := pred^no + 1; 

12.11.20 until (end of conjunctTon) ; 

12.11.21 desc per pred(O) := pred_no - 1; 

/* TTeep the total no. of predicates in conjunction */ 

12.11.22 index := 1; 

12.11.23 do begin /* set counter array to 1 ; to be used */ 

/* in NEXT DESC GR */ 

12.11.24 counter( indexT T; 

12.11.25 index := index + 1; 

12.11.26 until (end of counter array); 

12.11.27 return (no group) ; 

12.11.28 end proc ; 



12.14.1 proc NEXT DESC GR( input : conjunc_no, 

output : descriptor id group); 

/* This procedure generates the next descriptor-; id group that */ 
/* satisfies the predicates in the conjunction identified by */ 
/* conjunc_jno. */ 



12.14.2 array desc per p red, counter : integer; /* global arrays */ 

12.14.3 scalar index, eff__index : integer; 



12.14.4 

12.14.5 

12.14.6 

12.14.7 

12.14.8 

12.14.9 

12.14.10 

12.14.11 



12.14.12 

12.14.13 



12.14.14 



find the beginning of predicates for conjunc_no in RDIT; 
eff index := beginning position- 

/* The for loop finds the next descriptor- id group */ 
descriptor id group := null; 

for^ index “From 1 to, no. of predicates bv 1 do, 

descriptor id group : = descriptor id group + 

descriptor # id at RDIT( ef f_index+counter( index) ) ; 
eff_index := eff_index + desc_per_pred( index) ; 
end for ; 

/* In the remainder, the counter array is updated for the 
/* next invocation 
index := no. of predicates; 
counter( index) := counter( index) + 1; 

/* indicate that the next descriptor id for the last 
/* predicate will be picked up next time 
while counter( index) > desc_per^_pred( index) do, 

/* If the last descriptor id for that predicate is 
/* already picked up, indicate that the next 
/* descriptor id for the predicate immediately prior */ 
/* to this one will be picked up next time, together */ 



*/ 

*/ 



*/ 

*/ 

*/ 

*/ 
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12.14.15 

12.14.16 



12.14.17 

12.14.18 

12.14.19 

12.14.20 



12.14.21 

12.14.22 

12.14.23 

12.14.24 



/* with the first descriptor id of this predicate. */ 

/* This is done by setting the counter entry */ 

/* corresponding to present predicate to 1 and */ 

/* incrementing the counter entry corresponding to the */ 

/* immediately previous one. Keep doing this until no */ 

/* more adjustments are necessary. */ 

begin 

counterC index) := 1: 

/* next time tne first descriptor id for this */ 

/* predicate will be picked up */ 

index := index - 1; /* look at the previous predicate */ 

if index = 0 /* if all the descriptor- id groups have */ 



/* been picked 
then exit /* leave the loop */ 
else counter( index) : = counterC index) + 1; 

/* else increment count for the previous 
/* predicate 
end if 
end while 

return Cdescriptor id group) ; 
end proc ; 



*/ 



*/ 

*/ 



6.15.1 proc BROADCAST; 

~T* This procedure broadcasts the RDIT to all the other backends */ 

6.15.2 broadcast RDIT; 

6.15.3 end proc ; 



6.6.1 proc CREATE ( input : requested); 

7^ This procedure creates an occurrence of RDIT table for the */ 
/* given request. */ 

6.6.2 end proc ; 



D.3 Part III - The Data Abstraction for Attribute Table 



/* 

/* 

>,* 

/* 

/* 
!,* 
/* 
i ;■ * 
/* 



) Part III 
) Design 
) Designers 
) Date 
) Modified 

) Purpose 
This is the 



The Data Abstraction for Attribute Table 
ATM 

T.M. Ozsu, A. Orooji 
July 28, 1981 
Aug. 4, 1981 
Sept. 11, 1981 



data abstraction for attribute table. Operations 
on attribute table are done via the procedures in this abstraction. 



*/ 

*/ 

*/ 

*/ 

*1 

*/ 

*/ 



(8) Procedure Hierarchy for ATM 
ATM 

H -i + + + 



ATM$ ATM$ ATM$ ATM$ 

FIND INSERT DELETE CREATE 



(9) Data Structures 

/* The data structure definitions are included in the program */ 
/* specifications. */ 
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(10) Program Specifications 



mod ATM; 

programs FIND, INSERT, DELETE, CREATE 
datasets AT /* attribute table */ 
end mod ; 



6.12,3.1 proc FIND( input : AT* attribute, output : dditptr* type) ; 

7 * This procedure finds the location of the attribute in AT. */ 
/* It returns the pointer to the DDIT for that attribute */ 
/* and the type of descriptors specified on the attribute. */ 



6.12.3.2 find the matching attribute; 

6.12.3.3 dditptr := pointer at that position; 

6.12.3.4 type := type of descriptors defined on that attribute; 

6.12.3.5 return (dditptr, type); 

6.12.3.6 end proc ; 



1. proc INSERT( input : AT, attribute, dditptr); 

/*" This procedure inserts an (attribute, pointer) pair into AT. */ 

2. search for the position where attribute fits; 

3. insert new (attribute, dditptr) pair; 

4. end proc ; 



1. proc DELETE ( input : AT, attribute); 

/* This procedure deletes an (attribute, pointer) pair from AT. */ 

2. find the matching attribute; 

3. delete the entry at that position; 

4. end proc ; 



1. proc CREATE; 

7*" This procedure creates a new instance of the attribute table */ 
/* and returns a pointer to it. */ 

2. create a new instance of the attribute table; 

3. insert the database name together with the pointer to the 

new AT into the index table for AT's; 

4 . end proc ; 



1. proc UPDATE ( input : AT, attribute* dditptr); 

/* This procedure updates the dditptr of the given attribute to */ 
/* the new dditptr given as input. */ 

2. find the attribute m AT; 

3. replace the dditptr for the attribute with the new one; 

4. end proc ; 



D.4 Part IV - The Data* Abstraction for Descriptor-to-Descriptor-Id Table 



/* (1) Part IV : The Data Abstraction for DDIT 

/* (2) Design : DDITM 

/* (3) Designers : T.M. Ozsu, A. Orooii 

/* (4) Date : July 28, 1981 

/* (5) Modified : Aug. 4, 1981 

/* , Sept. 11, 1981 

/* (6) Purpose : 

/* This is the data abstraction for DDIT. Operations 

/* done via procedures in this abstraction. 



on 



DDIT are 



*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 
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(8) Procedure Hierarchy for DDITM 
DDITM 



+ 










DDITM$ 


DDITM$ 


DDITM$ 


DDITM $ 


DDITMS 


CDERIVE 


DERIVE 


INSERT 


DUPCHECK 


CREATE 






(9) Data 


Structures 





/* The data structure definitions are included in the program */ 
/* specifications. */ 



(10) Program Specifications 

mod DDITM 

programs DERIVE, CDERIVE, INSERT, DUPCHECK, CREATE 
datasets DDIT /* Descriptor- to-descriptor-id table */ 
end mod ; 



6.12.20.1 proc DERIVE ( input : predicate, dditptr, output : desc_ids); 

/* This procedure finds out the ids of~"HTl the descriptors */ 
/* from which the predicate can be derived and returns these */ 
/* ids in desc_ids. This routine is used for all the cases */ 
/* except when the request is insert and attribute of the */ 
/* keyword is used in type-C descriptors. */ 



6 . 12 . 20.2 

6.12.20.3 

6.12.20.4 

6.12.20.5 

6 . 12 . 20.6 

6.12.20.7 

6 . 12 . 20.8 

6.12.20.9 

6.12.20.10 
6.12.20.11 



desc_ids := null; 
do begin 

if predicate derivable from descriptor at DDIT(dditptr ) 
then 

desc_ids := desc — ids + 

descriptor id at DDIT(dditptr ) ; 

end if ; 

dditptr : = next entry position in DDIT; 
until (descriptors on the same attribute as predicate's finishes) 
return (desc ids ) ; 
end proc ; 



6.12.8.1 proc CDERIVE( input : predicate, dditptr, output : desc_id); 

T* This procedure finds out the id of the descriptor from */ 
/* which the predicate can be derived and returns this id. */ 
/* This routine is used only when the request is insert */ 
/* and attribute of the keyword is used in type-C */ 
/* descriptors. */ 



6.12.8.2 do begin 

6.12.8.3 if predicate derivable from descriptor at DDIT(dditptr ) 

6.12.8.4 then begin 

6.12.8.5 desc__id := descriptor id at DDIT(dditptr ) ; 

6.12.8.6 return ; 

6.12.3.7 end if 

6.12.8.8 dditptr : = next entry position in DDIT; 

6.12.8.9 until (descriptors on the same attribute as predicates's finishes) 

6.12.8.10 return ( 'not derivable'); 

6.12.8.11 end proc ; 
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1. proc DUPCHECK( input : descriptor, dditptr, output : answer); 

T* Given a descriptor, this procedure checks to make sure that */ 

/* its range does not overlap the ranges of other already */ 

/* defined descriptors in DDIT. */ 

2. answer := 'no'; 

3 . d£ begin 

4. if descriptor range overlaps the range of that 

pointed at by dditptr 

5 . then begin 

6. answer : = 'yes'; 

7 . return ; 

8 . end if ; 

9. dditptr := next descriptor in DDIT defined on the same attribute; 

10. until (no more descriptors on the same attribute); 

11 . end proc ; 



6.12.12.1 proc INSERK input : descriptor, desc_id, output ; dditptr); 

T* This procedure inserts a descriptor and its id into DDIT. */ 

6.12.12.2 find the place for the descriptor; 

6.12.12.3 insert the descriptor; 

6.12.12.4 end proc ; 



1 . proc CREATE ; 

This procedure creates a new instance of the descriptor-to- */ 
/* descriptor-id table. */ 

2. create a new instance of DDIT; 

3. insert the database name together with the pointer to the 

new DDIT into the index table for DDIT's; 

4. end proc 



D.5 Part V. - The Data Abstraction for Cluster-Definition Table 

/* (1) Part V ; The Data Abstraction for CDT 

/* (2) Design : CDTM 

/* (3) Designers : T.M. Ozsu, A. Orooii, 2. Shi 

/* (4) Date : July 31, 1981 

/* 15) Modified : Aug. 7. 1981 

/* , Sept. 11, 1981 

/* (6) Purpose : 

/* This is the data abstraction for cluster-definition table. 

/* Operations on CDT are done via the procedures in this 
/* abstraction. 



*/ 

*1 

*/ 

*1 

*/ 

*/ 



(8) Procedure Hierarchy for CDTM 
CDTM 
1 


T 

1 








CDTM$ 


CDTM$ 


1 

CDTM$ 


CDTM$ 


FIND_SIN|LE_CLUS 

+ +- 


FIND ADDRESS 


INSERT NEW CLUSTER 


CREATE 



CDTM$ 

MINCLUS 
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(9) Data Structures 

/* The data structure definitions are included in the program */ 
/* specifications. */ 



(10) Program Specifications 



mod CDTM ; 

programs FIND SINGLE CLUS, FIND ADDRESS, 
INSERT JTEW_CLUSTER7 CREATE; 
datasets ECDT, descriptor table (DT), 

descriptor-to-cluster map (DTCM) 

end mod ; 



7.6.1 proc FIND SINGLE CLUS( input : desc id group, output : clustered); 

T* This procedure finds the cluster whose descriptor- id set */ 

/* matches desc id group */ 

7.6.2 scalar stop : boolean; 

7.6.3 scalar index : integer; 

7.6.4 scalar mindesc : character; 



7.6.5 

7.6.6 



7.6.7 

7.6.8 

7.6.9 

7.6.10 

7.6.11 



7.6.12 

7.6.13 

7.6.14 

7.6.15 

7.6.16 

7.6.17 

7.6.18 

7.6.19 

7.6.20 

7.6.21 

7.6.22 

7.6.23 

7.6.24 

7.6.25 

7.6.26 

7.6.27 

7.6.28 

7.6.29 



cluster id := null; 

perfoniO lINCLUS(desc id group, mindesc) ; 

/* Among the descriptor ids in desc id group, find the */ 

/* id whose descriptor participates in defining the */ 

/* smallest number of clusters */ 

do begin /* this loop looks at each cluster whose */ 

/* descriptor- id set contains mindesc */ 
pick next entry in DTCM for this descriptor; 
pick the entry in ECDT pointed at by cdtptr in 
the current DTCM entry; 

if ECDT(cdtptr ) .no desc » no. of descriptors in desc id group 
then begin the descriptor- id set for this cTuster */ 

/* may match since it has the same number */ 

/* of descriptors */ 

index := 1; 
stop := 'false'; 

do /* look at each descriptor id in */ 

/* descriptor-id set */ 

if descriptor id currently pointed at by descptr 

desc id group( index) 

then stop := 'true' /* no match; stop */ 
else begin /* match, pick next descriptor id */ 
/* in each list */ 

index := index + 1; 
update descptr to point to 

next descriptor id; 

end if 

until (en5~~o~f~ descriptors in desc id group) or (stop); 
ii not stop # /* see if there was a match 

then begin /* there was a match */ 

clustered := cluster id at ECDT( cdtptr ) ; 
return ; 
end TF 

end if 

until (no more entries in DTCM for this descriptor); 
end proc ; 



12.15.1 proc FIND ADDRESS( input : desc id group, output : addresses); 

T* This procedure Finds tTTe addresses of the records in */ 
/* clusters whose descriptor- id set contain desc id group */ 

12.15.2 scalar index : integer; 

12.15.3 scalar stop rboolean; 



12.15.4 



addresses := null; 
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12.15.5 



12.15.6 

12.15.7 

12.15.8 

12.15.9 

12.15.10 

12.15.11 

12.15.12 

12.15.13 

12.15.14 



pertopa MINCLUS(desc id group, mindesc); 

/* Among the descriptor ids in desc id group, find the */ 

/* id whose descriptor participates in defining the */ 

/* smallest number or clusters. */ 

do begin /* do for all the clusters whose descriptor-id */ 

/*8et contain mindesc # */ 

pick next entry in DTCM for this descriptor; 
pick next entry in ECDT pointed at by cdtptr in 
the current DTCM entry; 

if ECDT(cdtptr ) .no desc >- no. of descriptors in desc i d g roup 
then begin /** the descriptor- id set for this cluster */ 

/* may contain desc id g roup */ 

index :* 1; 
stop : = 'false'; 

do /* look at each descriptor id in the */ 

7* descriptor- id set for this cluster */ 
if descriptor id currently pointed at by descptr > 



12.15.15 

12.15.16 

12.15.17 

12.15.18 

12.15.19 

12.15.20 

12.15.21 



12.15.22 

12.15.23 

12.15.24 

12.15.25 

12.15.26 



12.15.27 



12.15.28 

12.15.29 

12.15.30 

12.15.31 



desc id group( index) 

then stop : = 'true'; 

/* descriptor- id set does not contain */ 
/* desc ia group( index) */ 

else if descriptor id pointed at by descptr * 

desc id g roup( index) 

then begin 

T* match: look at next id in */ 

/* both lists */ 

index := index + 1; 
update descptr to point to next 
descrptor id; 
end begin 

else update descptr to point to next 
descriptor id; 

/* keep looking in the */ 

/* descriptor-id set */ 

/* for the cluster */ 
end if 

end if 

until vend of descriptors in desc id group) 
or (descptr - null) or (stop); 
if (index > no. of desc. in desc id g roup) 

then /* the search was successful; */ 

/* add addresses of the records */ 

/* in this cluster to the list of */ 

/* those which qualify */ 

addresses :« addresses + 

addresses pointed at by addrptr in 
ECDT(cdtptr); 

end if 
end if 

until (no more entries in DTCM for this descriptor); 
end proc ; 



1. proc MINCLUS( input : desc id group, output : mindesc); 

T* Among the descriptor ids in desc id group, this procedure */ 

/* finds the id whose descriptor participates in defining the */ 

/* smallest number of clusters. */ 

2. scalar min : integer; 

3. find the first descriptor id in desc id group in DT (call it cur_desc); 

4. min : = DT(cur_desc) .no_clus ; 

5. mindesc := current descriptor id; 

6* do begin /* do for all ids in desc id group */ 

7. find next descriptor id in desc id g roup in DT; 

8. if_ DT(cur desc) .no clus < min 7 * see if the current one is min */ 

9. then begin yes, make the current one min */ 

10. min := DT(cur__desc) .no_clus ; 

11. mindesc := current descriptor id; 

12 . end if 

13. unt il (no more ids in desc id group); 

14. re turn (mindesc) ; 

15 . end~~proc ; 



OO'-j cr>v-n^>UJ 
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1. proc INSERT NEW _C LUSTER (input : desc__id_set, cluster_id, 

output : cdtptr); 

/* This procedure inserts a new cluster into ECDT and updates */ 

/* all the other tables accordingly. It returns a pointer to */ 

/* this new entry. */ 

2. sort desc id set in ascending order of descriptor ids: 

7 * updateHECUT table *7 

create a new ECDT entry (call it new_cdt); 
new_ cdt . clustered := clustered; 

new_cdt.no desc := no of descriptors in desc_id_set; 
form a linked list of descriptor ids in desc id^set; 

update new_cdt .descptr to point to the linkecT iTst of descriptor ids; 
add new cdt to the ECDT list; 

/* update DTCM */ 

9. create DTCM entries for all descriptor ids in 

desc_id_set (call it new_dtcm) ; 

10. set cdtptr of all new_dtcm's to point to the new__cdt entry; 

11. add new_dtcm entries to their respective DTCM lists for each 

descriptor id; 

/* update DT table */ 

12. update cluster counts (no clus) of DT entries for the 

descriptor ids in Hesc_id_set ; 

13 . end proc ; 



1. proc CR£ATE( output : cdtptr); 

7 * This procedure creates a new ECDT and returns a pointer to it */ 

2. create a new instance of ECDT; 

3. insert th€* database name together with the pointer to the 

nev ECDT into the index table for ECDT's; 

4. return the: pointer to the new ECDT; 

5 . end proc ; 
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